Crash in [@ mozilla::dom::WebSocketImpl::Dispatch]
Categories
(Core :: Networking: WebSockets, defect, P2)
Tracking
()
People
(Reporter: jesup, Assigned: smaug)
References
Details
(4 keywords, Whiteboard: [necko-triaged][necko-priority-queue][adv-main121+r][adv-esr115.6+r])
Crash Data
Attachments
(2 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
dveditz
:
sec-approval+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr115+
tjr
:
sec-approval+
|
Details | Review |
Cloned bug 1555322 instead of reopening...
We're getting a steady stream of e5e5 UAFs from ::Dispatch. 90% are from DOM Workers calling Close(); the others are mostly mainthread and more scattered
Reporter | ||
Updated•2 years ago
|
Comment 1•2 years ago
|
||
:jesup, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit BugBot documentation.
Comment 2•2 years ago
|
||
This bug has been marked as a regression. Setting status flag for Nightly to affected
.
Still need investigation and assignemnt
Comment 4•2 years ago
|
||
For this crash https://crash-stats.mozilla.org/report/index/899c2425-9475-43b4-8736-511740230731
https://searchfox.org/mozilla-central/rev/503938c13ef2dd174705dc0f6d0683ae43074ccc/dom/websocket/WebSocket.cpp#2509
mImpl->FailConnection(closeCode, closeReason);
mImpl
is a raw pointer.
This is happening on the DOM Worker thread, so I'm wondering if this is either a bug where we improperly release mImpl, or a race (we seem to be calling mImpl->ReleaseObject(); only on the main thread).
Comment 5•2 years ago
|
||
(In reply to Ed Guloien [:edgul] from comment #3)
Still need investigation and assignemnt
:edgul what it is the timeline on investigation?
Wondering if we can expect something during the rest of the Fx118 cycle?
Kershaw is going to have a look. I'll defer to him.
Comment 7•2 years ago
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #4)
For this crash https://crash-stats.mozilla.org/report/index/899c2425-9475-43b4-8736-511740230731
https://searchfox.org/mozilla-central/rev/503938c13ef2dd174705dc0f6d0683ae43074ccc/dom/websocket/WebSocket.cpp#2509mImpl->FailConnection(closeCode, closeReason);
mImpl
is a raw pointer.
This is happening on the DOM Worker thread, so I'm wondering if this is either a bug where we improperly release mImpl, or a race (we seem to be calling mImpl->ReleaseObject(); only on the main thread).
I think we call ReleaseObject either on the main thread or the worker thread.
I'll create a patch to add more release assertions to ensure this.
Comment 8•2 years ago
|
||
Updated•2 years ago
|
Comment 9•2 years ago
|
||
Comment on attachment 9349751 [details]
Bug 1846328 - Add more release assertions, r=#necko
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Unknown. This is a diagnostic patch.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: No need to uplift.
- How likely is this patch to cause regressions; how much testing does it need?: Unknown. This is a diagnostic patch.
- Is Android affected?: Yes
Updated•2 years ago
|
Comment 10•1 years ago
|
||
Comment on attachment 9349751 [details]
Bug 1846328 - Add more release assertions, r=#necko
sec-approval+ = dveditz
Comment 11•1 years ago
|
||
![]() |
||
Comment 12•1 years ago
|
||
Comment 13•1 years ago
|
||
We just landed the diagnostic patch a few days ago.
I think we'll need at least a month or two to wait the assertions to be hit.
Updated•1 years ago
|
Updated•1 years ago
|
Comment 14•1 year ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #13)
We just landed the diagnostic patch a few days ago.
I think we'll need at least a month or two to wait the assertions to be hit.
No assertions hit so far. We'll need to wait 1 more month.
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Comment 15•1 year ago
|
||
Kershaw, now that a month has passed, do you see any hits? What are the recommended next steps?
Comment 16•1 year ago
|
||
(In reply to Neha Kochar [:neha] from comment #15)
Kershaw, now that a month has passed, do you see any hits? What are the recommended next steps?
The latest crash doesn't include the diagnostic patch added in this bug and I also don't see any hits.
Given the low crash rate, I think we still need more time.
I'll keep monitoring this bug and update status.
Updated•1 year ago
|
Comment 17•1 year ago
|
||
Unfortunately, we still have some new crashes. This means that the diagnostic patch doesn't work, so we'll need to come up with a new approach.
However, the diagnostic patch did trigger a crash.
The crash frames can be found below.
It shows that WebSocketImpl::ReleaseObject is called on main thread, but mIsMainThread
is false. When mIsMainThread
is false, this means that the WebSocketImpl
object was created on a worker thread. So, I don't understand why JS code running on main thread (WebSocket_Binding::set_onerror
) can access the object that was created on a worker thread.
Randell, do you maybe know what's wrong here? Should we just dispatch WebSocketImpl::ReleaseObject()
to the target thread?
Thanks.
0 libxul.so mozilla::dom::WebSocketImpl::ReleaseObject() dom/websocket/WebSocket.cpp:2245 context
1 libxul.so mozilla::dom::WebSocket::UpdateMustKeepAlive() dom/websocket/WebSocket.cpp:2218 cfi
2 libxul.so mozilla::EventListenerManager::NotifyEventListenerRemoved(nsAtom*) dom/events/EventListenerManager.cpp:814 cfi
3 libxul.so mozilla::EventListenerManager::RemoveEventHandler(nsAtom*) dom/events/EventListenerManager.cpp:1115 cfi
4 libxul.so mozilla::dom::WebSocket::SetOnerror(mozilla::dom::EventHandlerNonNull*) dom/websocket/WebSocket.h:101 inlined
4 libxul.so mozilla::dom::WebSocket_Binding::set_onerror(JSContext*, JS::Handle<JSObject*>, void*, JSJitSetterCallArgs) dom/bindings/WebSocketBinding.cpp:290 cfi
5 libxul.so mozilla::dom::binding_detail::GenericSetter<mozilla::dom::binding_detail::NormalThisPolicy>(JSContext*, unsigned int, JS::Value*) dom/bindings/BindingUtils.cpp:3275 cfi
6 libxul.so CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) js/src/vm/Interpreter.cpp:472 inlined
6 libxul.so js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) js/src/vm/Interpreter.cpp:566 inlined
Updated•1 year ago
|
Reporter | ||
Comment 18•1 year ago
|
||
The base problem is almost certainly the raw WebSocketImpl* mImpl in the WebSocket; this is dangerous. If mKeepingAlive is false, and we have a RefPtr to the Impl anywhere, when the RefPtr releases, it will delete WebSocketImpl, but mImpl will still point to the dead/deleted Impl.
Assignee | ||
Comment 19•1 year ago
|
||
FWIW I haven't yet seen a case where mImpl being raw pointer could cause issues.
When WebSocketImpl is deleted, WebSocket::mImpl is cleared.
I filed bug 1867927 for a simple safe null pointer crash case.
Assignee | ||
Comment 20•1 year ago
•
|
||
But, I think https://searchfox.org/mozilla-central/rev/033cc91cd088a992ab8728bab9821624cee62ce5/dom/websocket/WebSocket.cpp#2523 can be problematic. Not because mImpl is a raw pointer, but because nothing keeps WebSocketImpl alive on stack. We need a RefPtr there since in FailConnection() ConsoleError() call may (I think) trigger deletion of WebSocketImpl, and then we keep executing FailConnection() and there is CloseConnection() call which uses Dispatch.
(Need to check all the similar impl->Foo() calls)
There might be some other unrelated issues too.
Assignee | ||
Comment 21•1 year ago
|
||
Reporter | ||
Comment 22•1 year ago
|
||
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 23•1 year ago
|
||
Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup
Security Approval Request
- How easily could an exploit be constructed based on the patch?: quite hard; it's a very rare crash and adding a refptr indicates there's a possible UAF, but finding a way to get such a UAF is likely very hard
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: trivial
- How likely is this patch to cause regressions; how much testing does it need?: no chance of regression
- Is Android affected?: Yes
Comment 24•1 year ago
|
||
Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup
approved to land and request uplift
Comment 25•1 year ago
|
||
Comment on attachment 9366674 [details]
Bug 1846328: Add ProofOfRef arguments to WebSocketImpl methods r=smaug!
Revision D195346 was moved to bug 1868177. Setting attachment 9366674 [details] to obsolete.
Comment 26•1 year ago
|
||
Comment 27•1 year ago
|
||
Looks like we may want Beta and ESR approval requests on this if we're going to get any timely indication that it's fixed for good. Please nominate if you agree.
Reporter | ||
Comment 28•1 year ago
|
||
Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup
Beta/Release Uplift Approval Request
- User impact if declined: Security risk
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Adds a refptr
- String changes made/needed: none
- Is Android affected?: Yes
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration:
- User impact if declined: Security risk
- Fix Landed on Version: 122
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Adds a refptr
![]() |
||
Comment 29•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 30•1 year ago
•
|
||
I'll ask approval soon. Or ask jesup to ask :)
Comment 31•1 year ago
|
||
Comment on attachment 9366655 [details]
Bug 1846328, follow COM-rules when calling impl methods, r=jesup
Approved for 121.0b8 and 115.6esr, thanks :-)
Comment 32•1 year ago
|
||
uplift |
Updated•1 year ago
|
Comment 33•1 year ago
|
||
uplift |
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 34•10 months ago
|
||
Bulk-unhiding security bugs fixed in Firefox 119-121 (Fall 2023). Use "moo-doctrine-subsidy" to filter
Description
•