Closed Bug 529005 Opened 15 years ago Closed 15 years ago

IPDL: Gracefully handle shutdown and broken sockets

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: cjones, Assigned: cjones)

References

Details

Attachments

(2 files, 2 obsolete files)

Shutdown will be distinguished from "broken socket"/crash by having the side initiating shutdown send a special GOODBYE message. (Should GOODBYE be allowed to race? Needs more thought; it's easy enough to handle in IPDL code.) When the IPC socket closes after GOODBYE, it's treated as normal exit. The socket closing without GOODBYE is "abnormal exit". In both cases, exit initiates a shutdown sequence. The assumption is that Gecko should have cleaned up its IPDL actors before shutdown, in "application logic." So from IPDL's perspective, shutdown is about closing out actors that should already have been closed. On the parent side, in both DEBUG and OPT builds, and on the child side in DEBUG, the following steps will be taken (1a) recv GOODBYE, close socket (1b) notified of closed socket (crash) (2) For each live actor, in bottom-up order (managed actors before their managers), invoke actor->Shutdown(bool normalExit); Shutdown() defaults to a no-op implementation. (3) After invoking each Shutdown() handler, enqueue a Finalize() event in the main event queue. (4) Finalize(): walk the actor tree bottom-up, delete all live actors In the child processes, in DEBUG builds, after step (4) app shutdown will be initiated. In child processes in OPT builds, all steps above will simply be replaced with an |_exit(0)| from the socket OnClose() handler.
Assignee: nobody → jones.chris.g
(In reply to comment #0) > (3) After invoking each Shutdown() handler, enqueue a Finalize() event in the > main event queue. > (4) Finalize(): walk the actor tree bottom-up, delete all live actors > Now that I've got this mostly implemented, I think we're going to have to skip posting Finalize() to the event loop and instead immediately start Finalize() after Shutdown(). (I had originally wanted to do this to allow "deferred delete" hacks like we've had to use in NPAPI.) Problem is, shutting down the subprocess might be the last action taken before shutting down XPCOM, so Finalize() might have ended up fighting with quitting the event loop.
Attached patch backup/WIP (obsolete) — Splinter Review
Most of the interesting work is finished, needs some more massaging to play well with NPAPI and ContentProcess.
Attached patch v1 (obsolete) — Splinter Review
Turned out to be quick and painless to graft in the old plugin shutdown behavior. Ready for r?.
Attachment #413041 - Attachment is obsolete: true
Attachment #413044 - Flags: review?(benjamin)
Attachment #413044 - Flags: review?(bent.mozilla)
Comment on attachment 413044 [details] [diff] [review] v1 When the channel receives the goodbye message, it doesn't immediately close itself, but it waits for NotifyMaybeChannelError to inform it about the actual socket closing. Is there any reason for that? Why can't we call NotifyChannelClosed immediately on receiving goodbye? Under what conditions is Shutdown() called on actors? Only when we're at "final" shutdown, or also when a manager actor is deleted? Is Shutdown() the primary mechanism parent actors have to know that of crashes on the other side? I think I'd prefer an explicit OnError() method, which eventually we'd require all parent actors to implement.
(In reply to comment #4) > (From update of attachment 413044 [details] [diff] [review]) > When the channel receives the goodbye message, it doesn't immediately close > itself, but it waits for NotifyMaybeChannelError to inform it about the actual > socket closing. Is there any reason for that? Why can't we call > NotifyChannelClosed immediately on receiving goodbye? > To keep state consistent between the IO thread and worker thread. The IO thread posts the Goodbye event, then a "socket closed" event, and there's a race between the IO thread posting "socket closed" and the worker thread processing "recvd Goodbye." It keeps things simpler to wait for "socket closed". That said, in OPT builds in the child process, we probably just want to _exit() on receiving Goodbye. > Under what conditions is Shutdown() called on actors? Only when we're at > "final" shutdown, or also when a manager actor is deleted? > Only at final shutdown. > Is Shutdown() the primary mechanism parent actors have to know that of crashes > on the other side? I think I'd prefer an explicit OnError() method, which > eventually we'd require all parent actors to implement. Yes. The interface is |Shutdown(bool normal)|, so on crashes actors receive |Shutdown(false)|. I don't think the code paths for shutdown vs. crash will be significantly different, so I'd prefer to go through the same interface.
Per IRC conversation yesterday, the consensus now is a unified deletion/shutdown/crash "destroy" method with a |why| parameter naming which condition triggered destruction. After thinking on it overnight, I've come around to the name "ActorDestroy", let's do it. It would be invoked as follows Normal deletion |Foo::Send__delete__(foo, ...)|: (1) foo->Recv__delete__(...) (2) foo->ActorDestroy(Deleted) (3) fooParent->DeallocFoo(foo) Parent deleted |FooParent::Send__delete__(fooParent, ...)|: (1) fooParent->Recv__delete__(...) (2) foo->ActorDestroy(ParentDeleted) [etc. for foo's siblings] (3) fooParent->ActorDestroy(Deleted) (4) fooParent->DeallocFoo(foo) (5) fooGrandparent->DeallocFooParent(fooParent) [if fooParent isn't toplevel] Normal shutdown: equivalent to the hypothetical Toplevel::Send__delete(toplevel), except that |why| is NormalShutdown Abnormal shutdown: equivalent to normal shutdown, except that |why| is AbnormalShutdown Thoughts?
yes!
Attachment #413044 - Flags: review?(benjamin)
Attachment #413044 - Attachment is obsolete: true
Attachment #413701 - Flags: review?(benjamin)
Attachment #413044 - Flags: review?(bent.mozilla)
Attachment #413701 - Flags: review?(bent.mozilla)
Comment on attachment 413701 [details] [diff] [review] v2, unified deletion/shutdown/crash handling LGTM
Attachment #413701 - Flags: review?(bent.mozilla) → review+
Attachment #413701 - Flags: review?(benjamin) → review+
Blocks: OOPP
While working on other bugs, I found a few bugs in this code, and decided to add an IPDL unit test. I also found that I had omitted the call to |XRE_ShutdownChildProcess()| in the plugin child process; I added that, but needed to juggle the implementation to suit my needs. My spidey-sense tells me just to push these along with the other changes, but I'd be remiss if I didn't at least offer the followup r?.
Attachment #415575 - Flags: review?(benjamin)
Attachment #415575 - Flags: review?(bent.mozilla)
Attachment #415575 - Flags: review?(benjamin) → review+
Just for the record, I'm going to push both these as a single cset.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: