Closed Bug 1765393 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild]

Categories

(Core :: DOM: Content Processes, defect)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox99 --- unaffected
firefox100 --- unaffected
firefox101 + fixed
firefox102 + fixed

People

(Reporter: aryx, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

46 crashes from 37 Windows installations. The signature surfaced after bug 1762604 got fixed.

Crash report: https://crash-stats.mozilla.org/report/index/d6e8e86d-5c6a-4652-9412-c6de30220419

MOZ_CRASH Reason: MOZ_CRASH(MessageChannel destroyed without being closed (mChannelState == ChannelConnected).)

Top 10 frames of crashing thread:

0 xul.dll mozilla::ipc::MessageChannel::~MessageChannel ipc/glue/MessageChannel.cpp:475
1 xul.dll mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild ipc/ipdl/PProcessHangMonitorParent.cpp:42
2 xul.dll `anonymous namespace'::HangMonitorParent::~HangMonitorParent dom/ipc/ProcessHangMonitor.cpp:675
3 xul.dll static mozilla::ProcessHangMonitor::RemoveProcess dom/ipc/ProcessHangMonitor.cpp:1198
4 xul.dll mozilla::dom::ContentParent::ActorDestroy dom/ipc/ContentParent.cpp:1975
5 xul.dll mozilla::ipc::IProtocol::DestroySubtree ipc/glue/ProtocolUtils.cpp:577
6 xul.dll mozilla::dom::PContentParent::OnChannelClose ipc/ipdl/PContentParent.cpp:16902
7 xul.dll mozilla::ipc::MessageChannel::Close ipc/glue/MessageChannel.cpp:2082
8 xul.dll mozilla::ThreadEventQueue::RunShutdownTasks xpcom/threads/ThreadEventQueue.cpp:315
9 xul.dll nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:357
Flags: needinfo?(jstutte)

It looks as if there was some misalignment between mIsOpen and the real channel state that makes us not call Close() ?

I wonder if we really need this mIsOpen at all, can't we just rely on the channel status itself (CanSend())? Still I do not see where it could actually go wrong from code reading, we unset it only in ActorDestroy. Is there a timeframe after ActorDestroy where the channel still results open and we see a race with a previous Close() on the same channel here?

In any case it seems related to bug 1738104, which has been backed out from beta.

Flags: needinfo?(jstutte) → needinfo?(nika)

(In reply to Jens Stutte [:jstutte] from comment #1)

I wonder if we really need this mIsOpen at all, can't we just rely on the channel status itself (CanSend())?

Many actors have a manual mIsOpen or similar flag which they use because they added it before I added CanSend() and CanRecv() to actors themselves.

Still I do not see where it could actually go wrong from code reading, we unset it only in ActorDestroy. Is there a timeframe after ActorDestroy where the channel still results open and we see a race with a previous Close() on the same channel here?

The issue here is that we're late during shutdown, so the Dispatch call to dispatch to the background thread fails due to gXPCOMThreadsShutdown being set to true (despite the thread still being alive): https://searchfox.org/mozilla-central/rev/2fd70a667e6f89a9ec6622ecb302b1ab48e4a062/dom/ipc/ProcessHangMonitor.cpp#695-697. This means that Shutdown() returns early without waiting for the background thread to have actually closed the actor, leading to this crash.

If we remove the gXPCOMThreadsShutdown check from thread dispatch things will probably work better, as background threads will continue to work until they've actually been shut down.

Flags: needinfo?(nika)

I assume you refer to this check, such that mSink->PutEvent will decide if we can dispatch.

(In reply to Jens Stutte [:jstutte] from comment #3)

I assume you refer to this check, such that mSink->PutEvent will decide if we can dispatch.

Yeah, I'm thinking of killing that check in my patches to change when we run shutdown tasks during xpcom shutdown

Crash Signature: [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild] → [@ mozilla::ipc::MessageChannel::~MessageChannel | (anonymous namespace)::HangMonitorParent::~HangMonitorParent] [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild]
Has Regression Range: --- → yes
Keywords: regression
Regressed by: 1757802

The regressing bug 1757802 has been backed out

Fixed by backout.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch

This crash is still happening, even after the backout: bp-8dc3d7a5-001a-4434-aae1-c74360220429

I'll clear the regressed by field because that bug has been backed out. Maybe this was an existing issue?

Status: RESOLVED → REOPENED
No longer regressed by: 1757802
Resolution: FIXED → ---

The MOZ_CRASH as such has been introduced by https://phabricator.services.mozilla.com/D119354 in bug 1719577, IIUC.

From the statistics I guess bug 1738104 has some influence here for the current instances, as it is still happening in Nightly but since that bug was backed out we see no more instances in beta. And the recent crashes seem to have mozilla::ThreadEventQueue::RunShutdownTasks on their stacks.

Flags: needinfo?(nika)
Regressed by: 1738104

Yeah, this seems likely to be related to bug 1738104. I'm optimistic that this will be fixed by the change in https://phabricator.services.mozilla.com/D144592 which will make the late-during-shutdown dispatch which failed and is being hung on succeed.

Depends on: 1764119
Flags: needinfo?(nika)

Set release status flags based on info from the regressing bug 1738104

No crashes on Nightly since bug 1764119 landed - are we good to uplift to Beta?

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Flags: needinfo?(nika)
Resolution: --- → FIXED
Target Milestone: 101 Branch → 102 Branch

Filed an uplift request on bug 1764119

Flags: needinfo?(nika)

No crashes in 101.0b4, thanks!

You need to log in before you can comment on or make changes to this bug.