Open Bug 1748920 Opened 3 years ago Updated 3 years ago

Add docs for IPC_OK and IPC_FAIL*

Tracking

()

Status:

NEW

People

(Reporter: handyman, Unassigned)

References

Details

David Parks [:handyman]

Reporter

Description

•

3 years ago

Came up in discussion on #ipc. Some important bits:

nika: IPC_FAIL can be thought of like a hard crash assertion in most cases in that it usually either crashes the current process or causes the peer process to crash. So you want to use it for validation checks on ipc messages and state to validate e.g. that the message has a valid payload.

Jens Stutte (he/him): [...] So some words about consequences would be really helpful!

nika: Yeah, well to be more specific, it causes a ProcessingError error to be sent to the MessageChannel backend, which is handled differently for every actor

nika: Yeah, it's probably best to assume it's fatal unless you know the specific handling for your actor

Jens Stutte [:jstutte]

Updated

•

3 years ago

Comment 1

•

3 years ago

I encountered an interesting pattern in bug 1750525 and I assume we should follow the following rule of thumb, too:

"Do not IPC_FAIL on a failing SendXxxx dispatch, as this already indicates a dead channel/actor, just ignore and bail out to avoid actor delete ping-pong. Note that dropped messages will be reported already by the IPC sub-system, so no need to even warn."

Am I getting this right?

Flags: needinfo?(davidp99)

David Parks [:handyman]

Reporter

Comment 2

•

3 years ago

It's fair to say that a failed Send indicates a dead actor, in that the actor won't do more sending / receiving. There isn't really any actor tear-down ping-pong (the endpoints can't communicate) so the substantive difference is the behavior of ProcessingError, which you have already been dealing with.

Failed Sends are indeed written to the terminal. I take it you are talking about SendFoo being done inside a RecvBar (same actor or at least managed-actor-tree) and what RecvBar returns. RecvBar won't generally be logged to the console as failed if you don't return IPC_FAIL but you could expect it to if the message is sync, since the response can't be delivered. But there would be no reason to look at an async handler as having failed. So you would want to log a message about that.

Flags: needinfo?(davidp99)

Jens Stutte [:jstutte]

Comment 3

•

3 years ago

Probably I need to be more precise in identifying the pattern. At least in the IndexedDB code I looked at so far it seems to be mostly Send__delete__. Sometimes there are more than two actors involved, though. But I'd still say that "You wanted to kill something, you find it dead - be happy" sounds better to me than "You wanted to kill something, you find it dead - potentially go kill yourself and/or others".

In particular, the generated Send__delete__ code already emits a warning if the actor is dead, so I'd definitely think it's fine to ignore those like proposed here.

In general, I might be overly paranoid about the consequences of an IPC_FAIL, but what I really want to avoid is something like the following situation:

Let's assume, we have two processes, P and C, that are connected via two IPC channels A and B.
P asks C asynchronously for something on channel A, further processing in P might depend on the result (imagine some shutdown ack)
In the meantime C asks P synchronously for something different on channel B, P answers IPC_FAIL.
What happens then depends on what we decide to do as a reaction on that fail. Let's assume/fear the worst case and we just crash C.
P will never receive the answer on channel A and might not have proper reactions in place on channel A destroy that unblock the situation, and thus something hangs

Obviously this can become even more complex having more actors and/or processes that interact. And the reaction is something the actor specialization decides, not the IPC system itself. It is thus clear that some if not all of the responsibility (like "and might not have proper reactions in place on channel A destroy") is outside IPC, too. This is also quite theoretic, I have no concrete reason to assume from current crash-stat data that this happens often. Still the API design does not allow for other standard ways of error reporting (different severity levels or such), which makes it kind of easy to put more IPC_FAIL than good for us (which we will now hopefully catch better with D135238 anyway).

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Add docs for IPC_OK and IPC_FAIL*

Categories

(Core :: IPC, task)

Tracking

()

People

(Reporter: handyman, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3