Crash in [@ NSSSocketControl::CloseSocketAndDestroy | mozilla::net::TLSTransportLayer::~TLSTransportLayer]
Categories
(Core :: Networking: WebSockets, defect, P1)
Tracking
()
People
(Reporter: aryx, Assigned: valentin)
References
Details
(4 keywords, Whiteboard: [necko-triaged][necko-priority-queue] [adv-main117+r] [adv-esr115.2+r])
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr115+
|
Details | Review |
10 crashes from 5 installations of the latest Firefox 118.0a1 Nightly. No time to identify the regressing change at the moment.
Crash report: https://crash-stats.mozilla.org/report/index/48f2f18e-4793-4a91-bf4a-5433e0230809
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(mOwningThread == PR_GetCurrentThread())
Top 10 frames of crashing thread:
0 xul.dll NSSSocketControl::CloseSocketAndDestroy security/manager/ssl/NSSSocketControl.cpp:474
1 xul.dll mozilla::net::TLSTransportLayer::~TLSTransportLayer netwerk/protocol/http/TLSTransportLayer.cpp:348
2 xul.dll mozilla::net::TLSTransportLayer::Release netwerk/protocol/http/TLSTransportLayer.cpp:323
3 xul.dll nsCOMPtr<nsISocketTransport>::~nsCOMPtr xpcom/base/nsCOMPtr.h:340
3 xul.dll mozilla::net::CallOnTransportAvailable::~CallOnTransportAvailable netwerk/protocol/websocket/WebSocketChannel.cpp:741
4 xul.dll mozilla::net::CallOnTransportAvailable::~CallOnTransportAvailable netwerk/protocol/websocket/WebSocketChannel.cpp:741
5 xul.dll mozilla::RefPtrTraits<nsIRunnable>::Release mfbt/RefPtr.h:54
5 xul.dll RefPtr<nsIRunnable>::ConstRemovingRefPtrTraits<nsIRunnable>::Release mfbt/RefPtr.h:420
5 xul.dll RefPtr<nsIRunnable>::assign_assuming_AddRef mfbt/RefPtr.h:73
5 xul.dll RefPtr<nsIRunnable>::operator= mfbt/RefPtr.h:188
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Security: PSM' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
![]() |
Reporter | |
Comment 2•2 years ago
|
||
Could this be caused by the changes in bug 1842754? The change in bug 1847604 is only flagged as affecting performance.
Comment 3•2 years ago
|
||
I don't see how bug 1842754 could be the cause, but I find it more likely that bug 1842754 causes the crash than bug 1847604.
Updated•2 years ago
|
Comment 4•2 years ago
|
||
:manuel any updates since earlier?
Is it ok to set bug 1842754 as the regressor or have you determined otherwise? Wondering if it should be backed out or you have a patch in mind?
Comment 5•2 years ago
|
||
I don't have a patch in mind. Let's back out bug 1842754 and see if that fixes the crashes? My further investigation only led me to
https://searchfox.org/mozilla-central/rev/dc8348b3730c0d29dafd01c653d9151eaa9bc30f/netwerk/protocol/websocket/WebSocketChannel.cpp#3722-3725
that the channel (or other object holding the TLSTransportLayer
) is dropped after the dispatch, but before the dispatch is run. That way the lifetime of the (refcounted) TLSTransportLayer object end in the different thread unexpectedly.
Comment 6•2 years ago
|
||
I'll ask the sheriffs to backout
Comment 7•2 years ago
|
||
Comment 8•2 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/bc1cbc172685007e724fad1ea24efc373c909c6d
Comment 9•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
For more information, please visit BugBot documentation.
![]() |
||
Comment 10•2 years ago
|
||
I don't think bug 1842754 is the issue here. The reason we're seeing crashes in the wild is because bug 1847260 upgraded an assertion to a diagnostic assertion to determine if/where a thread invariant is being violated, and lo and behold, this is one of them. It looks like CallOnTransportAvailable
is releasing a TLSTransportLayer
on the main thread rather than the socket thread.
Comment 11•2 years ago
|
||
:(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #10)
I don't think bug 1842754 is the issue here. The reason we're seeing crashes in the wild is because bug 1847260 upgraded an assertion to a diagnostic assertion to determine if/where a thread invariant is being violated, and lo and behold, this is one of them. It looks like
CallOnTransportAvailable
is releasing aTLSTransportLayer
on the main thread rather than the socket thread.
:keeler Thanks for the info, given this is a new top crash what's the next step for investigation/assignee for a fix?
![]() |
||
Comment 12•2 years ago
|
||
Valentin is the triage owner for this component.
As for a fix, should be a pretty straightforward use of ProxyRelease
.
Comment 13•2 years ago
|
||
This indicates a race, so it should probably be a security bug. I'm assuming it affects all branches and we're just seeing the crashes here because of the diagnostic asserts (which won't apply to older branches, even if we uplifted the checks).
Comment 14•2 years ago
|
||
[@ NSSSocketControl::CloseSocketAndDestroy | nsSSLIOLayerClose ] looks like the same basic thing: bp-7281f805-6efa-44aa-9ae0-43d6e0230810
Assignee | ||
Comment 15•2 years ago
|
||
It seems TLSTransportLayer is marked NS_DECL_THREADSAFE_ISUPPORTS, but we only expect ~TLSTransportLayer to be called on the socket thread.
I'll try dispatching to the socket thread instead.
Assignee | ||
Comment 16•2 years ago
|
||
Comment 17•2 years ago
|
||
The bug is marked as tracked for firefox118 (nightly). However, the bug still has low severity.
:ghess, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Assignee | ||
Updated•2 years ago
|
Comment 18•2 years ago
|
||
Comment 20•2 years ago
|
||
Comment 21•2 years ago
|
||
The patch landed in nightly and beta is affected.
:valentin, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval. Also, don't forget to request an uplift for the patches in the regression caused by this fix.
- If no, please set
status-firefox117
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 22•2 years ago
|
||
Requested uplift in bug 1848999 instead.
Comment 23•2 years ago
|
||
Comment on attachment 9348525 [details]
Bug 1847904 - Dispatch PR_Close to socketthread in ~TLSTransportLayer r=keeler
Needed for bug 1848999, approved for 117.0b9 and 115.2esr.
Comment 24•2 years ago
|
||
uplift |
Updated•2 years ago
|
Updated•2 years ago
|
Comment 25•2 years ago
|
||
uplift |
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Description
•