Closed Bug 1846328 Opened 2 years ago Closed 1 year ago

Crash in [@ mozilla::dom::WebSocketImpl::Dispatch]

Categories

(Core :: Networking: WebSockets, defect, P2)

68 Branch
Unspecified
All
defect

Tracking

()

RESOLVED FIXED
122 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox-esr115 121+ fixed
firefox116 --- wontfix
firefox117 --- wontfix
firefox118 --- wontfix
firefox119 --- wontfix
firefox120 --- wontfix
firefox121 + fixed
firefox122 + fixed

People

(Reporter: jesup, Assigned: smaug)

References

Details

(4 keywords, Whiteboard: [necko-triaged][necko-priority-queue][adv-main121+r][adv-esr115.6+r])

Crash Data

Attachments

(2 files, 1 obsolete file)

Cloned bug 1555322 instead of reopening...

We're getting a steady stream of e5e5 UAFs from ::Dispatch. 90% are from DOM Workers calling Close(); the others are mostly mainthread and more scattered

Severity: -- → S2
Priority: -- → P2
Whiteboard: [necko-priority-review][necko-triaged]

:jesup, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit BugBot documentation.

Flags: needinfo?(rjesup)

This bug has been marked as a regression. Setting status flag for Nightly to affected.

Still need investigation and assignemnt

Whiteboard: [necko-priority-review][necko-triaged] → [necko-priority-queue][necko-triaged]

For this crash https://crash-stats.mozilla.org/report/index/899c2425-9475-43b4-8736-511740230731
https://searchfox.org/mozilla-central/rev/503938c13ef2dd174705dc0f6d0683ae43074ccc/dom/websocket/WebSocket.cpp#2509

mImpl->FailConnection(closeCode, closeReason);

mImpl is a raw pointer.
This is happening on the DOM Worker thread, so I'm wondering if this is either a bug where we improperly release mImpl, or a race (we seem to be calling mImpl->ReleaseObject(); only on the main thread).

(In reply to Ed Guloien [:edgul] from comment #3)

Still need investigation and assignemnt

:edgul what it is the timeline on investigation?
Wondering if we can expect something during the rest of the Fx118 cycle?

Kershaw is going to have a look. I'll defer to him.

Flags: needinfo?(kershaw)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #4)

For this crash https://crash-stats.mozilla.org/report/index/899c2425-9475-43b4-8736-511740230731
https://searchfox.org/mozilla-central/rev/503938c13ef2dd174705dc0f6d0683ae43074ccc/dom/websocket/WebSocket.cpp#2509

mImpl->FailConnection(closeCode, closeReason);

mImpl is a raw pointer.
This is happening on the DOM Worker thread, so I'm wondering if this is either a bug where we improperly release mImpl, or a race (we seem to be calling mImpl->ReleaseObject(); only on the main thread).

I think we call ReleaseObject either on the main thread or the worker thread.
I'll create a patch to add more release assertions to ensure this.

Flags: needinfo?(kershaw)
Assignee: nobody → kershaw
Status: NEW → ASSIGNED

Comment on attachment 9349751 [details]
Bug 1846328 - Add more release assertions, r=#necko

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Unknown. This is a diagnostic patch.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: No need to uplift.
  • How likely is this patch to cause regressions; how much testing does it need?: Unknown. This is a diagnostic patch.
  • Is Android affected?: Yes
Attachment #9349751 - Flags: sec-approval?

Comment on attachment 9349751 [details]
Bug 1846328 - Add more release assertions, r=#necko

sec-approval+ = dveditz

Attachment #9349751 - Flags: sec-approval? → sec-approval+
Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ae769caa17a3 Add more release assertions, r=necko-reviewers,jesup

We just landed the diagnostic patch a few days ago.
I think we'll need at least a month or two to wait the assertions to be hit.

Whiteboard: [necko-priority-queue][necko-triaged] → [necko-monitor][necko-triaged]

(In reply to Kershaw Chang [:kershaw] from comment #13)

We just landed the diagnostic patch a few days ago.
I think we'll need at least a month or two to wait the assertions to be hit.

No assertions hit so far. We'll need to wait 1 more month.

Flags: needinfo?(rjesup)

Kershaw, now that a month has passed, do you see any hits? What are the recommended next steps?

Flags: needinfo?(kershaw)

(In reply to Neha Kochar [:neha] from comment #15)

Kershaw, now that a month has passed, do you see any hits? What are the recommended next steps?

The latest crash doesn't include the diagnostic patch added in this bug and I also don't see any hits.
Given the low crash rate, I think we still need more time.
I'll keep monitoring this bug and update status.

Flags: needinfo?(kershaw)

Unfortunately, we still have some new crashes. This means that the diagnostic patch doesn't work, so we'll need to come up with a new approach.

However, the diagnostic patch did trigger a crash.
The crash frames can be found below.
It shows that WebSocketImpl::ReleaseObject is called on main thread, but mIsMainThread is false. When mIsMainThread is false, this means that the WebSocketImpl object was created on a worker thread. So, I don't understand why JS code running on main thread (WebSocket_Binding::set_onerror) can access the object that was created on a worker thread.

Randell, do you maybe know what's wrong here? Should we just dispatch WebSocketImpl::ReleaseObject() to the target thread?
Thanks.

0 	libxul.so 	mozilla::dom::WebSocketImpl::ReleaseObject() 	dom/websocket/WebSocket.cpp:2245 	context
1 	libxul.so 	mozilla::dom::WebSocket::UpdateMustKeepAlive() 	dom/websocket/WebSocket.cpp:2218 	cfi
2 	libxul.so 	mozilla::EventListenerManager::NotifyEventListenerRemoved(nsAtom*) 	dom/events/EventListenerManager.cpp:814 	cfi
3 	libxul.so 	mozilla::EventListenerManager::RemoveEventHandler(nsAtom*) 	dom/events/EventListenerManager.cpp:1115 	cfi
4 	libxul.so 	mozilla::dom::WebSocket::SetOnerror(mozilla::dom::EventHandlerNonNull*) 	dom/websocket/WebSocket.h:101 	inlined
4 	libxul.so 	mozilla::dom::WebSocket_Binding::set_onerror(JSContext*, JS::Handle<JSObject*>, void*, JSJitSetterCallArgs) 	dom/bindings/WebSocketBinding.cpp:290 	cfi
5 	libxul.so 	mozilla::dom::binding_detail::GenericSetter<mozilla::dom::binding_detail::NormalThisPolicy>(JSContext*, unsigned int, JS::Value*) 	dom/bindings/BindingUtils.cpp:3275 	cfi
6 	libxul.so 	CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) 	js/src/vm/Interpreter.cpp:472 	inlined
6 	libxul.so 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) 	js/src/vm/Interpreter.cpp:566 	inlined
Flags: needinfo?(rjesup)
Whiteboard: [necko-monitor][necko-triaged] → [necko-triaged][necko-priority-queue]

The base problem is almost certainly the raw WebSocketImpl* mImpl in the WebSocket; this is dangerous. If mKeepingAlive is false, and we have a RefPtr to the Impl anywhere, when the RefPtr releases, it will delete WebSocketImpl, but mImpl will still point to the dead/deleted Impl.

Flags: needinfo?(rjesup)

FWIW I haven't yet seen a case where mImpl being raw pointer could cause issues.
When WebSocketImpl is deleted, WebSocket::mImpl is cleared.

I filed bug 1867927 for a simple safe null pointer crash case.

But, I think https://searchfox.org/mozilla-central/rev/033cc91cd088a992ab8728bab9821624cee62ce5/dom/websocket/WebSocket.cpp#2523 can be problematic. Not because mImpl is a raw pointer, but because nothing keeps WebSocketImpl alive on stack. We need a RefPtr there since in FailConnection() ConsoleError() call may (I think) trigger deletion of WebSocketImpl, and then we keep executing FailConnection() and there is CloseConnection() call which uses Dispatch.
(Need to check all the similar impl->Foo() calls)

There might be some other unrelated issues too.

Assignee: kershaw → rjesup

Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: quite hard; it's a very rare crash and adding a refptr indicates there's a possible UAF, but finding a way to get such a UAF is likely very hard
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: trivial
  • How likely is this patch to cause regressions; how much testing does it need?: no chance of regression
  • Is Android affected?: Yes
Attachment #9366655 - Flags: sec-approval?

Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup

approved to land and request uplift

Attachment #9366655 - Flags: sec-approval? → sec-approval+
Blocks: 1868177

Comment on attachment 9366674 [details]
Bug 1846328: Add ProofOfRef arguments to WebSocketImpl methods r=smaug!

Revision D195346 was moved to bug 1868177. Setting attachment 9366674 [details] to obsolete.

Attachment #9366674 - Attachment is obsolete: true
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/56ceb0d2ed32 follow COM-rules when calling impl methods, r=jesup

Looks like we may want Beta and ESR approval requests on this if we're going to get any timely indication that it's fixed for good. Please nominate if you agree.

Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup

Beta/Release Uplift Approval Request

  • User impact if declined: Security risk
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Adds a refptr
  • String changes made/needed: none
  • Is Android affected?: Yes

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: Security risk
  • Fix Landed on Version: 122
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Adds a refptr
Flags: needinfo?(rjesup)
Attachment #9366655 - Flags: approval-mozilla-esr115?
Attachment #9366655 - Flags: approval-mozilla-beta?
Group: network-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 122 Branch
Assignee: rjesup → smaug

I'll ask approval soon. Or ask jesup to ask :)

Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup

Approved for 121.0b8 and 115.6esr, thanks :-)

Attachment #9366655 - Flags: approval-mozilla-esr115?
Attachment #9366655 - Flags: approval-mozilla-esr115+
Attachment #9366655 - Flags: approval-mozilla-beta?
Attachment #9366655 - Flags: approval-mozilla-beta+
Flags: qe-verify-
QA Whiteboard: [post-critsmash-triage]
Whiteboard: [necko-triaged][necko-priority-queue] → [necko-triaged][necko-priority-queue][adv-main121+r]
Whiteboard: [necko-triaged][necko-priority-queue][adv-main121+r] → [necko-triaged][necko-priority-queue][adv-main121+r][adv-esr115.6+r]

Bulk-unhiding security bugs fixed in Firefox 119-121 (Fall 2023). Use "moo-doctrine-subsidy" to filter

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: