Closed Bug 1325918 Opened 7 years ago Closed 4 years ago

Crash in content processes in BackgroundChildImpl::ProcessingError with message "MsgDropped: Channel error: cannot send/recv"

Categories

(Core :: IPC, defect)

defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr45 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- fixed

People

(Reporter: ting, Assigned: jld)

References

Details

(Keywords: crash, Whiteboard: [geckoview][fenix:p1])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-3f1dcb0e-0558-4e15-9f3a-7b5992161226.
=============================================================
Top #7 of Nightly 20161225030206 on Windows, 5 crashes from 2 installations. The error message is: "Channel error: cannot send/recv."

The crash stack:

xul.dll!mozilla::ipc::BackgroundChildImpl::ProcessingError(mozilla::ipc::HasResultCodes::Result aCode, const char * aReason) Line 141	C++
xul.dll!mozilla::ipc::MessageChannel::ReportConnectionError(const char * aChannelName, IPC::Message * aMsg) Line 2085	C++
xul.dll!mozilla::ipc::MessageChannel::Send(IPC::Message * aMsg) Line 789	C++
xul.dll!mozilla::ipc::PBackgroundChild::SendPServiceWorkerManagerConstructor(mozilla::dom::PServiceWorkerManagerChild * actor) Line 680	C++
xul.dll!mozilla::dom::workers::ServiceWorkerManager::ActorCreated(mozilla::ipc::PBackgroundChild * aActor) Line 1738	C++
xul.dll!`anonymous namespace'::ChildImpl::OpenChildProcessActorRunnable::Run() Line 1883	C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1219	C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 96	C++
xul.dll!mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate * aDelegate) Line 301	C++
xul.dll!MessageLoop::RunHandler() Line 232	C++
xul.dll!MessageLoop::Run() Line 212	C++
xul.dll!nsBaseAppShell::Run() Line 158	C++
xul.dll!nsAppShell::Run() Line 262	C++
xul.dll!XRE_RunAppShell() Line 924	C++
xul.dll!mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate * aDelegate) Line 278	C++
xul.dll!MessageLoop::RunHandler() Line 232	C++
xul.dll!MessageLoop::Run() Line 212	C++
xul.dll!XRE_InitChildProcess(int aArgc, char * * aArgv, const XREChildData * aChildData) Line 760	C++
Hello Ben, it's a significant crash, could you please help this out?
Flags: needinfo?(bhsu)
Priority: -- → P1
Sure!
Assignee: nobody → bhsu
Flags: needinfo?(bhsu)
After discussing with some colleagues, we think 1293284 be the root cause of this crash, since this crash takes place when initializing the content process, and thus we have a pretty short uptime (~2 sec) here. However, I failed reproduce this manually, and I am trying to do it automatically.
See Also: → 1293284
Crash volume for signature 'mozilla::ipc::BackgroundChildImpl::ProcessingError':
 - nightly (version 54): 28 crashes from 2017-01-23.
 - aurora  (version 53): 773 crashes from 2017-01-23.
 - beta    (version 52): 38 crashes from 2017-01-23.
 - release (version 51): 88 crashes from 2017-01-16.
 - esr     (version 45): 1 crash from 2016-08-10.

Crash volume on the last weeks (Week N is from 02-06 to 02-12):
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       1      24
 - aurora      390       2
 - beta         31       3
 - release      50      14       0
 - esr           0       0       0       0       0       0       0

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser   Content   Plugin
 - nightly           #55
 - aurora            #3
 - beta    #7311     #191
 - release           #340
 - esr
#3 Windows topcrash in Aurora 20170217004020.
Update: Talked with Kanru that he hopes to get to bug 1293284 shortly, which is believed the root cause of this crash pattern.
#3 Windows topcrash in Aurora 20170324004022, still.
(In reply to Nicholas Nethercote [:njn] from comment #7)
> #3 Windows topcrash in Aurora 20170324004022, still.

If the theory that these are due to bug 1293284, is it possible this is due to enable multi-e10s on aurora?  Since SWM will always start when the content process is spawned it is very likely to be on the stack if we do this early-shutdown-crash thing.

Or is there something in the tree launching and killing content processes very quickly?
#3 Windows topcrash in Aurora 20170331004006, still!
Mass wontfix for bugs affecting firefox 52.
many of the crash comments link the issue to running firefox within the external sandboxie tool.
Priority: P1 → P2
Assignee: bhsu → nobody
Bug 1293284 was fixed. Let's see how the crash volume was affected.
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3

Hi Hsin-Yi. This issue has seen a recent uptick in crashes on 77 in Fenix Beta (1340 crashes in the last 7 days). Can someone please take a look?

Flags: needinfo?(htsai)
Whiteboard: [geckoview][fenix:p1]

(In reply to Emily Toop (:fluffyemily) from comment #14)

Hi Hsin-Yi. This issue has seen a recent uptick in crashes on 77 in Fenix Beta (1340 crashes in the last 7 days). Can someone please take a look?

Deferring to Jens :)

Flags: needinfo?(htsai) → needinfo?(jstutte)

The crash signature is not ServiceWorker specific now (if it ever was).

On Fenix many of these crashes are all coming from the "Socket Thread". There are some Firefox crashes that happen across more threads. They are all happening in processes that identify themselves as content processes and accordingly have no "IPDL Background" thread. The main loops of the content processes don't seem to indicate that they think they are in shutdown.

I presume something weird is happening with either intentional shutdown of the parent where it didn't want for the child, or unintentional termination of the parent that leaves the child able to leave a crash report.

In any event, it seems like the BackgroundChild instances don't need to cause crashes now that we generally accept that messages can and will be dropped?

Severity: critical → --
Component: DOM: Service Workers → IPC
Flags: needinfo?(jld)
Priority: P3 → --
Summary: Crash in mozilla::ipc::BackgroundChildImpl::ProcessingError from PBackgroundChild::SendPServiceWorkerManagerConstructor → Crash in content processes in BackgroundChildImpl::ProcessingError with message "MsgDropped: Channel error: cannot send/recv"
Flags: needinfo?(jstutte)

Any chance someone could take a look at this crash please? It's a fairly big Fenix crasher.

So, most toplevels either ignore MsgDropped or, in some cases like GMPChild, immediately exit the process on (I assume) the assumption that the channel being unexpectedly closed or otherwise breaking means that the other process (in that case, the parent) has probably exited and that's the most useful thing it can do.

The weird thing about these crashes is that they're from our crash reporter, which runs in the parent process and therefore requires it to still exist, but the most obvious reason to get into this state is the parent process itself having crashed (or being killed, especially on Android because of the LMK). We've had bugs before where Android's own crash reporter was reporting content process crashes (deliberate) after the parent process had crashed (due to some other bug) and that at least made sense.

The crash message just tells us that the channel was in ChannelError state, which has no associated info about why. It could be from an actual I/O error (as in the case where the other process crashes)… or a channel being closed while it's still in the ChannelOpening state. PBackground channels are closed when the child thread that owns them exits, so I wonder if this is happening during shutdown, although from comment #16 that may not make sense.

In any case I think we can just ignore the error, and if there's some user-visible problem underlying this we'll hopefully find out in some other, more actionable way.

Assignee: nobody → jld
Flags: needinfo?(jld)
OS: Windows 8 → Unspecified
Hardware: x86 → Unspecified
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9465a3d25cf9
Ignore MsgDropped errors in BackgroundChildImpl. r=nika
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: