Closed Bug 1765393 Opened 3 years ago Closed 3 years ago

Crash in [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild]

Tracking

()

Status:

RESOLVED FIXED

Milestone:

102 Branch

Tracking Flags:

Tracking

Status

firefox-esr91

---

unaffected

firefox99

---

unaffected

firefox100

---

unaffected

firefox101

fixed

firefox102

fixed

People

(Reporter: aryx, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Description

•

3 years ago

46 crashes from 37 Windows installations. The signature surfaced after bug 1762604 got fixed.

Crash report: https://crash-stats.mozilla.org/report/index/d6e8e86d-5c6a-4652-9412-c6de30220419

MOZ_CRASH Reason: MOZ_CRASH(MessageChannel destroyed without being closed (mChannelState == ChannelConnected).)

Top 10 frames of crashing thread:

0 xul.dll mozilla::ipc::MessageChannel::~MessageChannel ipc/glue/MessageChannel.cpp:475
1 xul.dll mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild ipc/ipdl/PProcessHangMonitorParent.cpp:42
2 xul.dll `anonymous namespace'::HangMonitorParent::~HangMonitorParent dom/ipc/ProcessHangMonitor.cpp:675
3 xul.dll static mozilla::ProcessHangMonitor::RemoveProcess dom/ipc/ProcessHangMonitor.cpp:1198
4 xul.dll mozilla::dom::ContentParent::ActorDestroy dom/ipc/ContentParent.cpp:1975
5 xul.dll mozilla::ipc::IProtocol::DestroySubtree ipc/glue/ProtocolUtils.cpp:577
6 xul.dll mozilla::dom::PContentParent::OnChannelClose ipc/ipdl/PContentParent.cpp:16902
7 xul.dll mozilla::ipc::MessageChannel::Close ipc/glue/MessageChannel.cpp:2082
8 xul.dll mozilla::ThreadEventQueue::RunShutdownTasks xpcom/threads/ThreadEventQueue.cpp:315
9 xul.dll nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:357

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 1

•

3 years ago

•

Edited

It looks as if there was some misalignment between mIsOpen and the real channel state that makes us not call Close() ?

I wonder if we really need this mIsOpen at all, can't we just rely on the channel status itself (CanSend())? Still I do not see where it could actually go wrong from code reading, we unset it only in ActorDestroy. Is there a timeframe after ActorDestroy where the channel still results open and we see a race with a previous Close() on the same channel here?

In any case it seems related to bug 1738104, which has been backed out from beta.

Flags: needinfo?(jstutte) → needinfo?(nika)

Nika Layzell [:nika] (ni? for response)

Comment 2

•

3 years ago

(In reply to Jens Stutte [:jstutte] from comment #1)

I wonder if we really need this mIsOpen at all, can't we just rely on the channel status itself (CanSend())?

Many actors have a manual mIsOpen or similar flag which they use because they added it before I added CanSend() and CanRecv() to actors themselves.

Still I do not see where it could actually go wrong from code reading, we unset it only in ActorDestroy. Is there a timeframe after ActorDestroy where the channel still results open and we see a race with a previous Close() on the same channel here?

The issue here is that we're late during shutdown, so the Dispatch call to dispatch to the background thread fails due to gXPCOMThreadsShutdown being set to true (despite the thread still being alive): https://searchfox.org/mozilla-central/rev/2fd70a667e6f89a9ec6622ecb302b1ab48e4a062/dom/ipc/ProcessHangMonitor.cpp#695-697. This means that Shutdown() returns early without waiting for the background thread to have actually closed the actor, leading to this crash.

If we remove the gXPCOMThreadsShutdown check from thread dispatch things will probably work better, as background threads will continue to work until they've actually been shut down.

Flags: needinfo?(nika)

Jens Stutte [:jstutte]

Comment 3

•

3 years ago

I assume you refer to this check, such that mSink->PutEvent will decide if we can dispatch.

Nika Layzell [:nika] (ni? for response)

Comment 4

•

3 years ago

(In reply to Jens Stutte [:jstutte] from comment #3)

I assume you refer to this check, such that mSink->PutEvent will decide if we can dispatch.

Yeah, I'm thinking of killing that check in my patches to change when we run shutdown tasks during xpcom shutdown

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Updated

•

3 years ago

Crash Signature: [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild] → [@ mozilla::ipc::MessageChannel::~MessageChannel | (anonymous namespace)::HangMonitorParent::~HangMonitorParent] [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild]

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Updated

•

3 years ago

Has Regression Range: --- → yes

Keywords: regression

Regressed by: 1757802

Donal Meehan [:dmeehan]

Updated

•

3 years ago

tracking-firefox101: --- → +

Nika Layzell [:nika] (ni? for response)

Comment 5

•

3 years ago

The regressing bug 1757802 has been backed out

Donal Meehan [:dmeehan]

Comment 6

•

3 years ago

Fixed by backout.

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox101: affected → fixed

Resolution: --- → FIXED

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Updated

•

3 years ago

Target Milestone: --- → 101 Branch

Andrew McCreight [:mccr8]

Comment 7

•

3 years ago

This crash is still happening, even after the backout: bp-8dc3d7a5-001a-4434-aae1-c74360220429

I'll clear the regressed by field because that bug has been backed out. Maybe this was an existing issue?

Status: RESOLVED → REOPENED

status-firefox101: fixed → affected

No longer regressed by: 1757802

Resolution: FIXED → ---

Jens Stutte [:jstutte]

Comment 8

•

3 years ago

The MOZ_CRASH as such has been introduced by https://phabricator.services.mozilla.com/D119354 in bug 1719577, IIUC.

From the statistics I guess bug 1738104 has some influence here for the current instances, as it is still happening in Nightly but since that bug was backed out we see no more instances in beta. And the recent crashes seem to have mozilla::ThreadEventQueue::RunShutdownTasks on their stacks.

Flags: needinfo?(nika)

Regressed by: 1738104

Nika Layzell [:nika] (ni? for response)

Comment 9

•

3 years ago

Yeah, this seems likely to be related to bug 1738104. I'm optimistic that this will be fixed by the change in https://phabricator.services.mozilla.com/D144592 which will make the late-during-shutdown dispatch which failed and is being hung on succeed.

Depends on: 1764119

Flags: needinfo?(nika)

BugBot [:suhaib / :marco/ :calixte]

Comment 10

•

3 years ago

Set release status flags based on info from the regressing bug 1738104

status-firefox102: --- → affected

Ryan VanderMeulen [:RyanVM]

Comment 11

•

3 years ago

No crashes on Nightly since bug 1764119 landed - are we good to uplift to Beta?

Status: REOPENED → RESOLVED

Closed: 3 years ago → 3 years ago

status-firefox102: affected → fixed

tracking-firefox102: --- → +

Flags: needinfo?(nika)

Resolution: --- → FIXED

Target Milestone: 101 Branch → 102 Branch

Nika Layzell [:nika] (ni? for response)

Comment 12

•

3 years ago

Filed an uplift request on bug 1764119

Flags: needinfo?(nika)

Ryan VanderMeulen [:RyanVM]

Comment 13

•

3 years ago

No crashes in 101.0b4, thanks!

status-firefox101: affected → fixed

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Crash in [@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::PProcessHangMonitorChild::~PProcessHangMonitorChild]

Categories

(Core :: DOM: Content Processes, defect)

Tracking

()

People

(Reporter: aryx, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Updated

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13