Handle unprocessed ipc message instead of abort on ChildContent.cpp

NEW
Unassigned

Status

Firefox OS
General
--
enhancement
5 years ago
5 years ago

People

(Reporter: leo.bugzilla.gecko, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
this issue is derived from Bug 867025.

if it mismatched sender and receiver in processing IPC message, there are always aborted by NS_RUNTIMEABORT() on ContentChild::ProcessingError().

Gecko   : 
Gecko   : ###!!! [Child][AsyncChannel] Error: Route error: message sent to unknown actor ID
Gecko   : 
Gecko   : [Child 5877] ###!!! ABORT: aborting because of fatal error: file B2G/gecko/dom/ipc/ContentChild.cpp, line 1020
Gecko   : mozalloc_abort: [Child 5877] ###!!! ABORT: aborting because of fatal error: file /B2G/gecko/dom/ipc/ContentChild.cpp, line 1020

if AysncChannel has any messages which can't handle by the receiver. 
Maybe it can handle this on AsyncChannel error properly, instead of NS_RUNTIMEABORT().
and also, NS_RUNTIMEABORT() makes Crash Report to User.
i think it isn't good to user. user could think it is unstable.

because this ipc message no need anymore, how about just removed unnecessary ipc message on AsyncChannel?
Created attachment 752055 [details]
tara crash report

I catch another 'mozilla::dom::ContentChild::ProcessingError' crash by monkey test, after applying bug 867025 patch.
Maybe this crash is in gallery app, according the snapshot 1 and 2.
> if AysncChannel has any messages which can't handle by the receiver.

Before we ignore an error, we must first understand why we get into an erroneous state.

How do we get into this state?
Created attachment 754307 [details]
unagi crash report

This crash occurs again. And gaia contains Dale's patch.

git log camera.js
commit 76a8a13a8eea7ff98b1b80992b307bddb8f50daa
Author: David Flanagan <dflanagan@mozilla.com>
Date:   Fri May 17 14:52:19 2013 -0700

    Merge pull request #9826 from daleharvey/camera-monkey
    
    Bug 867025 - Disable cancel activity button when picture taken r=@davidflanagan(cherry picked from commit 771ad40f7f7fc3b36aae2ea0c25b091e29d626c6)
> if it mismatched sender and receiver in processing IPC message, there are always aborted 
> by NS_RUNTIMEABORT() on ContentChild::ProcessingError().

I don't understand what you mean by "it mismatched sender and receiver in processing IPC message."

> because this ipc message no need anymore, how about just removed unnecessary ipc message 
> on AsyncChannel?

I don't understand what message you're referring to.  Perhaps it would be helpful if you could attach a patch.

I'd really like us to understand why we're crashing and add a testcase (probably a crashtest).  If all we're doing is removing an NS_RUNTIMEABORT without understanding why we're hitting that patch, that is not good.
> I don't understand what you mean by "it mismatched sender and receiver in
> processing IPC message."

it means when receiver(actor) is disappeared by exceptional case, sender may not know status of receiver because ipc process is asynchronous transfer mode. 
 
> I don't understand what message you're referring to.  Perhaps it would be
> helpful if you could attach a patch.

it is a kind of proposal to avoid a overall ipc abort by unhandling ipc message.
honest! ipc machanism is hard to debug. first of all, it is hard to reproduce.

Comment 6

5 years ago
The only way this occurs is if an IPDL protocol is specified in an unsafe way. For example, if a child is responsible for sending the __delete__ message, the protocol should be designed such that the parent is notified that a __delete__ is expected so it can stop sending messages. Without this, a race can occur where the parent sends a regular message at the same time as the child sends __delete__, so the child sees a message sent to an actor that no longer exists. This is not an abort we should remove.
You need to log in before you can comment on or make changes to this bug.