Crash in [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild]
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox99 | --- | unaffected |
firefox100 | --- | unaffected |
firefox101 | + | fixed |
firefox102 | + | fixed |
People
(Reporter: aryx, Unassigned)
References
(Regression)
Details
(Keywords: crash, regression)
Crash Data
46 crashes from 37 Windows installations. The signature surfaced after bug 1762604 got fixed.
Crash report: https://crash-stats.mozilla.org/report/index/d6e8e86d-5c6a-4652-9412-c6de30220419
MOZ_CRASH Reason: MOZ_CRASH(MessageChannel destroyed without being closed (mChannelState == ChannelConnected).)
Top 10 frames of crashing thread:
0 xul.dll mozilla::ipc::MessageChannel::~MessageChannel ipc/glue/MessageChannel.cpp:475
1 xul.dll mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild ipc/ipdl/PProcessHangMonitorParent.cpp:42
2 xul.dll `anonymous namespace'::HangMonitorParent::~HangMonitorParent dom/ipc/ProcessHangMonitor.cpp:675
3 xul.dll static mozilla::ProcessHangMonitor::RemoveProcess dom/ipc/ProcessHangMonitor.cpp:1198
4 xul.dll mozilla::dom::ContentParent::ActorDestroy dom/ipc/ContentParent.cpp:1975
5 xul.dll mozilla::ipc::IProtocol::DestroySubtree ipc/glue/ProtocolUtils.cpp:577
6 xul.dll mozilla::dom::PContentParent::OnChannelClose ipc/ipdl/PContentParent.cpp:16902
7 xul.dll mozilla::ipc::MessageChannel::Close ipc/glue/MessageChannel.cpp:2082
8 xul.dll mozilla::ThreadEventQueue::RunShutdownTasks xpcom/threads/ThreadEventQueue.cpp:315
9 xul.dll nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:357
Comment 1•2 years ago
•
|
||
It looks as if there was some misalignment between mIsOpen
and the real channel state that makes us not call Close()
?
I wonder if we really need this mIsOpen
at all, can't we just rely on the channel status itself (CanSend()
)? Still I do not see where it could actually go wrong from code reading, we unset it only in ActorDestroy
. Is there a timeframe after ActorDestroy
where the channel still results open and we see a race with a previous Close()
on the same channel here?
In any case it seems related to bug 1738104, which has been backed out from beta.
Comment 2•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #1)
I wonder if we really need this
mIsOpen
at all, can't we just rely on the channel status itself (CanSend()
)?
Many actors have a manual mIsOpen
or similar flag which they use because they added it before I added CanSend()
and CanRecv()
to actors themselves.
Still I do not see where it could actually go wrong from code reading, we unset it only in
ActorDestroy
. Is there a timeframe afterActorDestroy
where the channel still results open and we see a race with a previousClose()
on the same channel here?
The issue here is that we're late during shutdown, so the Dispatch
call to dispatch to the background thread fails due to gXPCOMThreadsShutdown
being set to true (despite the thread still being alive): https://searchfox.org/mozilla-central/rev/2fd70a667e6f89a9ec6622ecb302b1ab48e4a062/dom/ipc/ProcessHangMonitor.cpp#695-697. This means that Shutdown()
returns early without waiting for the background thread to have actually closed the actor, leading to this crash.
If we remove the gXPCOMThreadsShutdown
check from thread dispatch things will probably work better, as background threads will continue to work until they've actually been shut down.
Comment 3•2 years ago
|
||
I assume you refer to this check, such that mSink->PutEvent
will decide if we can dispatch.
Comment 4•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #3)
I assume you refer to this check, such that
mSink->PutEvent
will decide if we can dispatch.
Yeah, I'm thinking of killing that check in my patches to change when we run shutdown tasks during xpcom shutdown
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 5•2 years ago
|
||
The regressing bug 1757802 has been backed out
Comment 6•2 years ago
|
||
Fixed by backout.
Reporter | ||
Updated•2 years ago
|
Comment 7•2 years ago
|
||
This crash is still happening, even after the backout: bp-8dc3d7a5-001a-4434-aae1-c74360220429
I'll clear the regressed by field because that bug has been backed out. Maybe this was an existing issue?
Comment 8•2 years ago
|
||
The MOZ_CRASH
as such has been introduced by https://phabricator.services.mozilla.com/D119354 in bug 1719577, IIUC.
From the statistics I guess bug 1738104 has some influence here for the current instances, as it is still happening in Nightly but since that bug was backed out we see no more instances in beta. And the recent crashes seem to have mozilla::ThreadEventQueue::RunShutdownTasks
on their stacks.
Comment 9•2 years ago
|
||
Yeah, this seems likely to be related to bug 1738104. I'm optimistic that this will be fixed by the change in https://phabricator.services.mozilla.com/D144592 which will make the late-during-shutdown dispatch which failed and is being hung on succeed.
Comment 10•2 years ago
|
||
Set release status flags based on info from the regressing bug 1738104
Comment 11•2 years ago
|
||
No crashes on Nightly since bug 1764119 landed - are we good to uplift to Beta?
Comment 13•2 years ago
|
||
No crashes in 101.0b4, thanks!
Description
•