Closed Bug 529005 Opened 15 years ago Closed 15 years ago

IPDL: Gracefully handle shutdown and broken sockets

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: cjones, Assigned: cjones)

References

Details

Attachments

(2 files, 2 obsolete files)

backup/WIP 15 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 52.86 KB, patch		Details \| Diff \| Splinter Review
v1 15 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 54.08 KB, patch		Details \| Diff \| Splinter Review
v2, unified deletion/shutdown/crash handling 15 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 59.77 KB, patch	benjamin : review+ bent.mozilla : review+	Details \| Diff \| Splinter Review
some moderately nontrivial additions to the first patch 15 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 15.77 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Description

•

15 years ago

Shutdown will be distinguished from "broken socket"/crash by having the side initiating shutdown send a special GOODBYE message. (Should GOODBYE be allowed to race? Needs more thought; it's easy enough to handle in IPDL code.) When the IPC socket closes after GOODBYE, it's treated as normal exit. The socket closing without GOODBYE is "abnormal exit". In both cases, exit initiates a shutdown sequence. The assumption is that Gecko should have cleaned up its IPDL actors before shutdown, in "application logic." So from IPDL's perspective, shutdown is about closing out actors that should already have been closed. On the parent side, in both DEBUG and OPT builds, and on the child side in DEBUG, the following steps will be taken (1a) recv GOODBYE, close socket (1b) notified of closed socket (crash) (2) For each live actor, in bottom-up order (managed actors before their managers), invoke actor->Shutdown(bool normalExit); Shutdown() defaults to a no-op implementation. (3) After invoking each Shutdown() handler, enqueue a Finalize() event in the main event queue. (4) Finalize(): walk the actor tree bottom-up, delete all live actors In the child processes, in DEBUG builds, after step (4) app shutdown will be initiated. In child processes in OPT builds, all steps above will simply be replaced with an |_exit(0)| from the socket OnClose() handler.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

15 years ago

Assignee: nobody → jones.chris.g

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 1

•

15 years ago

(In reply to comment #0) > (3) After invoking each Shutdown() handler, enqueue a Finalize() event in the > main event queue. > (4) Finalize(): walk the actor tree bottom-up, delete all live actors > Now that I've got this mostly implemented, I think we're going to have to skip posting Finalize() to the event loop and instead immediately start Finalize() after Shutdown(). (I had originally wanted to do this to allow "deferred delete" hacks like we've had to use in NPAPI.) Problem is, shutting down the subprocess might be the last action taken before shutting down XPCOM, so Finalize() might have ended up fighting with quitting the event loop.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 2

•

15 years ago

Attached patch backup/WIP (obsolete) — Details — Splinter Review

Most of the interesting work is finished, needs some more massaging to play well with NPAPI and ContentProcess.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 3

•

15 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

Turned out to be quick and painless to graft in the old plugin shutdown behavior. Ready for r?.

Attachment #413041 - Attachment is obsolete: true

Attachment #413044 - Flags: review?(benjamin)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

15 years ago

Attachment #413044 - Flags: review?(bent.mozilla)

Benjamin Smedberg

Comment 4

•

15 years ago

Comment on attachment 413044 [details] [diff] [review] v1 When the channel receives the goodbye message, it doesn't immediately close itself, but it waits for NotifyMaybeChannelError to inform it about the actual socket closing. Is there any reason for that? Why can't we call NotifyChannelClosed immediately on receiving goodbye? Under what conditions is Shutdown() called on actors? Only when we're at "final" shutdown, or also when a manager actor is deleted? Is Shutdown() the primary mechanism parent actors have to know that of crashes on the other side? I think I'd prefer an explicit OnError() method, which eventually we'd require all parent actors to implement.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 5

•

15 years ago

(In reply to comment #4) > (From update of attachment 413044 [details] [diff] [review]) > When the channel receives the goodbye message, it doesn't immediately close > itself, but it waits for NotifyMaybeChannelError to inform it about the actual > socket closing. Is there any reason for that? Why can't we call > NotifyChannelClosed immediately on receiving goodbye? > To keep state consistent between the IO thread and worker thread. The IO thread posts the Goodbye event, then a "socket closed" event, and there's a race between the IO thread posting "socket closed" and the worker thread processing "recvd Goodbye." It keeps things simpler to wait for "socket closed". That said, in OPT builds in the child process, we probably just want to _exit() on receiving Goodbye. > Under what conditions is Shutdown() called on actors? Only when we're at > "final" shutdown, or also when a manager actor is deleted? > Only at final shutdown. > Is Shutdown() the primary mechanism parent actors have to know that of crashes > on the other side? I think I'd prefer an explicit OnError() method, which > eventually we'd require all parent actors to implement. Yes. The interface is |Shutdown(bool normal)|, so on crashes actors receive |Shutdown(false)|. I don't think the code paths for shutdown vs. crash will be significantly different, so I'd prefer to go through the same interface.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 6

•

15 years ago

Per IRC conversation yesterday, the consensus now is a unified deletion/shutdown/crash "destroy" method with a |why| parameter naming which condition triggered destruction. After thinking on it overnight, I've come around to the name "ActorDestroy", let's do it. It would be invoked as follows Normal deletion |Foo::Send__delete__(foo, ...)|: (1) foo->Recv__delete__(...) (2) foo->ActorDestroy(Deleted) (3) fooParent->DeallocFoo(foo) Parent deleted |FooParent::Send__delete__(fooParent, ...)|: (1) fooParent->Recv__delete__(...) (2) foo->ActorDestroy(ParentDeleted) [etc. for foo's siblings] (3) fooParent->ActorDestroy(Deleted) (4) fooParent->DeallocFoo(foo) (5) fooGrandparent->DeallocFooParent(fooParent) [if fooParent isn't toplevel] Normal shutdown: equivalent to the hypothetical Toplevel::Send__delete(toplevel), except that |why| is NormalShutdown Abnormal shutdown: equivalent to normal shutdown, except that |why| is AbnormalShutdown Thoughts?

Benjamin Smedberg

Comment 7

•

15 years ago

yes!

Benjamin Smedberg

Updated

•

15 years ago

Attachment #413044 - Flags: review?(benjamin)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 8

•

15 years ago

Attached patch v2, unified deletion/shutdown/crash handling — Details — Splinter Review

Attachment #413044 - Attachment is obsolete: true

Attachment #413701 - Flags: review?(benjamin)

Attachment #413044 - Flags: review?(bent.mozilla)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

15 years ago

Attachment #413701 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 9

•

15 years ago

Comment on attachment 413701 [details] [diff] [review] v2, unified deletion/shutdown/crash handling LGTM

Attachment #413701 - Flags: review?(bent.mozilla) → review+

Benjamin Smedberg

Updated

•

15 years ago

Attachment #413701 - Flags: review?(benjamin) → review+

Benjamin Smedberg

Updated

•

15 years ago

Blocks: OOPP

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 10

•

15 years ago

Attached patch some moderately nontrivial additions to the first patch — Details — Splinter Review

While working on other bugs, I found a few bugs in this code, and decided to add an IPDL unit test. I also found that I had omitted the call to |XRE_ShutdownChildProcess()| in the plugin child process; I added that, but needed to juggle the implementation to suit my needs. My spidey-sense tells me just to push these along with the other changes, but I'd be remiss if I didn't at least offer the followup r?.

Attachment #415575 - Flags: review?(benjamin)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

15 years ago

Attachment #415575 - Flags: review?(bent.mozilla)

Benjamin Smedberg

Updated

•

15 years ago

Attachment #415575 - Flags: review?(benjamin) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 11

•

15 years ago

Just for the record, I'm going to push both these as a single cset.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 12

•

15 years ago

Pushed http://hg.mozilla.org/projects/electrolysis/rev/7ebfbec7521c

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

15 years ago

Attachment #415575 - Flags: review?(bent.mozilla)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

14 years ago

Depends on: 566054

You need to log in before you can comment on or make changes to this bug.