Closed
Bug 529005
Opened 15 years ago
Closed 15 years ago
IPDL: Gracefully handle shutdown and broken sockets
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
RESOLVED
FIXED
People
(Reporter: cjones, Assigned: cjones)
References
Details
Attachments
(2 files, 2 obsolete files)
59.77 KB,
patch
|
benjamin
:
review+
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
15.77 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
Shutdown will be distinguished from "broken socket"/crash by having the side initiating shutdown send a special GOODBYE message. (Should GOODBYE be allowed to race? Needs more thought; it's easy enough to handle in IPDL code.) When the IPC socket closes after GOODBYE, it's treated as normal exit. The socket closing without GOODBYE is "abnormal exit".
In both cases, exit initiates a shutdown sequence. The assumption is that Gecko should have cleaned up its IPDL actors before shutdown, in "application logic." So from IPDL's perspective, shutdown is about closing out actors that should already have been closed. On the parent side, in both DEBUG and OPT builds, and on the child side in DEBUG, the following steps will be taken
(1a) recv GOODBYE, close socket
(1b) notified of closed socket (crash)
(2) For each live actor, in bottom-up order (managed actors before their managers), invoke
actor->Shutdown(bool normalExit);
Shutdown() defaults to a no-op implementation.
(3) After invoking each Shutdown() handler, enqueue a Finalize() event in the main event queue.
(4) Finalize(): walk the actor tree bottom-up, delete all live actors
In the child processes, in DEBUG builds, after step (4) app shutdown will be initiated. In child processes in OPT builds, all steps above will simply be replaced with an |_exit(0)| from the socket OnClose() handler.
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → jones.chris.g
Assignee | ||
Comment 1•15 years ago
|
||
(In reply to comment #0)
> (3) After invoking each Shutdown() handler, enqueue a Finalize() event in the
> main event queue.
> (4) Finalize(): walk the actor tree bottom-up, delete all live actors
>
Now that I've got this mostly implemented, I think we're going to have to skip posting Finalize() to the event loop and instead immediately start Finalize() after Shutdown(). (I had originally wanted to do this to allow "deferred delete" hacks like we've had to use in NPAPI.) Problem is, shutting down the subprocess might be the last action taken before shutting down XPCOM, so Finalize() might have ended up fighting with quitting the event loop.
Assignee | ||
Comment 2•15 years ago
|
||
Most of the interesting work is finished, needs some more massaging to play well with NPAPI and ContentProcess.
Assignee | ||
Comment 3•15 years ago
|
||
Turned out to be quick and painless to graft in the old plugin shutdown behavior. Ready for r?.
Attachment #413041 -
Attachment is obsolete: true
Attachment #413044 -
Flags: review?(benjamin)
Assignee | ||
Updated•15 years ago
|
Attachment #413044 -
Flags: review?(bent.mozilla)
Comment 4•15 years ago
|
||
Comment on attachment 413044 [details] [diff] [review]
v1
When the channel receives the goodbye message, it doesn't immediately close itself, but it waits for NotifyMaybeChannelError to inform it about the actual socket closing. Is there any reason for that? Why can't we call NotifyChannelClosed immediately on receiving goodbye?
Under what conditions is Shutdown() called on actors? Only when we're at "final" shutdown, or also when a manager actor is deleted?
Is Shutdown() the primary mechanism parent actors have to know that of crashes on the other side? I think I'd prefer an explicit OnError() method, which eventually we'd require all parent actors to implement.
Assignee | ||
Comment 5•15 years ago
|
||
(In reply to comment #4)
> (From update of attachment 413044 [details] [diff] [review])
> When the channel receives the goodbye message, it doesn't immediately close
> itself, but it waits for NotifyMaybeChannelError to inform it about the actual
> socket closing. Is there any reason for that? Why can't we call
> NotifyChannelClosed immediately on receiving goodbye?
>
To keep state consistent between the IO thread and worker thread. The IO thread posts the Goodbye event, then a "socket closed" event, and there's a race between the IO thread posting "socket closed" and the worker thread processing "recvd Goodbye." It keeps things simpler to wait for "socket closed".
That said, in OPT builds in the child process, we probably just want to _exit() on receiving Goodbye.
> Under what conditions is Shutdown() called on actors? Only when we're at
> "final" shutdown, or also when a manager actor is deleted?
>
Only at final shutdown.
> Is Shutdown() the primary mechanism parent actors have to know that of crashes
> on the other side? I think I'd prefer an explicit OnError() method, which
> eventually we'd require all parent actors to implement.
Yes. The interface is |Shutdown(bool normal)|, so on crashes actors receive |Shutdown(false)|. I don't think the code paths for shutdown vs. crash will be significantly different, so I'd prefer to go through the same interface.
Assignee | ||
Comment 6•15 years ago
|
||
Per IRC conversation yesterday, the consensus now is a unified deletion/shutdown/crash "destroy" method with a |why| parameter naming which condition triggered destruction. After thinking on it overnight, I've come around to the name "ActorDestroy", let's do it.
It would be invoked as follows
Normal deletion |Foo::Send__delete__(foo, ...)|:
(1) foo->Recv__delete__(...)
(2) foo->ActorDestroy(Deleted)
(3) fooParent->DeallocFoo(foo)
Parent deleted |FooParent::Send__delete__(fooParent, ...)|:
(1) fooParent->Recv__delete__(...)
(2) foo->ActorDestroy(ParentDeleted) [etc. for foo's siblings]
(3) fooParent->ActorDestroy(Deleted)
(4) fooParent->DeallocFoo(foo)
(5) fooGrandparent->DeallocFooParent(fooParent) [if fooParent isn't toplevel]
Normal shutdown:
equivalent to the hypothetical Toplevel::Send__delete(toplevel), except that |why| is NormalShutdown
Abnormal shutdown:
equivalent to normal shutdown, except that |why| is AbnormalShutdown
Thoughts?
Comment 7•15 years ago
|
||
yes!
Updated•15 years ago
|
Attachment #413044 -
Flags: review?(benjamin)
Assignee | ||
Comment 8•15 years ago
|
||
Attachment #413044 -
Attachment is obsolete: true
Attachment #413701 -
Flags: review?(benjamin)
Attachment #413044 -
Flags: review?(bent.mozilla)
Assignee | ||
Updated•15 years ago
|
Attachment #413701 -
Flags: review?(bent.mozilla)
Comment on attachment 413701 [details] [diff] [review]
v2, unified deletion/shutdown/crash handling
LGTM
Attachment #413701 -
Flags: review?(bent.mozilla) → review+
Updated•15 years ago
|
Attachment #413701 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 10•15 years ago
|
||
While working on other bugs, I found a few bugs in this code, and decided to add an IPDL unit test. I also found that I had omitted the call to |XRE_ShutdownChildProcess()| in the plugin child process; I added that, but needed to juggle the implementation to suit my needs.
My spidey-sense tells me just to push these along with the other changes, but I'd be remiss if I didn't at least offer the followup r?.
Attachment #415575 -
Flags: review?(benjamin)
Assignee | ||
Updated•15 years ago
|
Attachment #415575 -
Flags: review?(bent.mozilla)
Updated•15 years ago
|
Attachment #415575 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 11•15 years ago
|
||
Just for the record, I'm going to push both these as a single cset.
Assignee | ||
Comment 12•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Attachment #415575 -
Flags: review?(bent.mozilla)
You need to log in
before you can comment on or make changes to this bug.
Description
•