Closed Bug 540097 Opened 15 years ago Closed 14 years ago

IPDL: Errors processing messages should result in the child process being terminated

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta5+

People

(Reporter: cjones, Assigned: cjones)

References

Details

Attachments

(2 files, 1 obsolete file)

Basically this includes any condition in which a Send*() or Call*() method would return false.  In the parent, we'll |kill(SIGKILL, child)|, and in the child, we'd commit suicide.
I've come across some cases where content-process code continues to attempt to send messages long after a channel error or Close(), and bad things happen.  It's not really the fault of that code, since the content-process contract is that it commits seppuku when anything goes wrong.  So, I'm going to co-opt this bug for finishing some of this work.

I've added a new |ProcessingError(what)| notification interface for top-level actors.  It's called whenever there's a "processing error", which includes sending messages to a closed channel, handler returns false, deserialization error, and more.  This complements the existing Close()/ActorDestroy(why) path on error and shutdown.

Also, I patched ContentChild to _exit(0) from ActorDestroy() for release builds.  This is a step in the right direction by eliding all the XPCOM shutdown goop, but probably still isn't quite what we want --- instead of Close()ing the channel or requesting Destroy() or whatever, we should just |kill(SIGKILL)| the content process from chome, as described in comment 0.  We can do this in a followup if/when shutdown becomes annoying.
Assignee: nobody → jones.chris.g
Attachment #460633 - Flags: review?(bent.mozilla)
tracking-fennec: --- → ?
Fixed typo in qhead, and fixed the test a bit while I was at it.
Attachment #460633 - Attachment is obsolete: true
Attachment #460642 - Flags: review?(bent.mozilla)
Attachment #460633 - Flags: review?(bent.mozilla)
Comment on attachment 460642 [details] [diff] [review]
Add a ProcessingError(what) notification interface for top-level actors

Looks good.
Attachment #460642 - Flags: review?(bent.mozilla) → review+
I've nom'd this for blocking for two reasons: first, the contract with content-process code is that it doesn't need to check return values or care about IPC connectivity.  This patch fulfills the ipc/glue part of that contract by allowing ContentChild to early-exit on connectivity errors, which may or may not be detected before the "connection error" event is posted to the main thread.  Second, this patch adds RUNTIMEABORT() for "real errors", like invalid messages.  These are really really bad, and I'd like to smoke them out if they exist ASAP.
blocking2.0: --- → beta4+
tracking-fennec: ? → ---
Comment on attachment 460634 [details] [diff] [review]
Fix content processeses to (i) _exit(0) from ActorDestroy() in release builds; (ii) _exit(0) on send-on-closed-channel; and (iii) abort on other processing errors

>diff --git a/dom/ipc/ContentChild.cpp b/dom/ipc/ContentChild.cpp

> void
> ContentChild::ActorDestroy(ActorDestroyReason why)
> {
>     if (AbnormalShutdown == why)
>         NS_WARNING("shutting down because of crash!");
> 
>+#ifndef DEBUG
>+    QuickExit();
>+#endif

Hrm. I think we should quick-exit if AbnormalShutdown != NormalShutdown even in debug builds.

We're not going to do this for plugin children, or will that be a different patch?

r=me with that change
Attachment #460634 - Flags: review?(benjamin) → review+
(In reply to comment #7)
> Comment on attachment 460634 [details] [diff] [review]
> Fix content processeses to (i) _exit(0) from ActorDestroy() in release builds;
> (ii) _exit(0) on send-on-closed-channel; and (iii) abort on other processing
> errors
> Hrm. I think we should quick-exit if AbnormalShutdown != NormalShutdown even in
> debug builds.
> 

Yeah, that makes sense.  Will fix.

> We're not going to do this for plugin children, or will that be a different
> patch?
> 

I don't think we should do that.  We don't know what persistent state plugins keep or how they synchronize it.  I kinda don't want to open that can of worms unless we have a reason to.  In fact, I'd almost be in favor of going in the other direction and ensuring that NP_Shutdown() is called regardless of how the plugin process shuts down.  But since this apparently isn't an issue in the field (yay stable firefox), I'm inclined also not to care.
Is this going to land?  It's a beta4 blocker and we are freezing in two days.  Additionally, it's unclear why this needs to block beta 4 instead of just shipping in a beta.  Please clarify.
No answer --> beta5+, so sad, too bad.
blocking2.0: beta4+ → beta5+
Was on PTO.  This wasn't a beta4 blocker, though I think it should be a fennecb1 blocker.
To be clear, I meant "I don't think this should have blocked beta4", not "this wasn't marked blocking:beta4+".
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: