Mojo ports may not be notified of peer node destruction
Categories
(Core :: IPC, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox127 | --- | fixed |
People
(Reporter: nika, Assigned: nika)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
While investigating bug 1875095, :yannis noticed that in some situations it was possible for a toplevel IPDL protocol to never be notified of the other side actor's destruction if it was caused by a process crash. Thanks to some investigation with a pernosco trace, I was able to narrow down the cause of this to what appears to be an issue with the mojo core ports code we imported in bug 1706374.
Suppose we have 3 nodes: A, B, and C. In this scenario A is the broker, and there are two child-nodes B and C. A creates a port-pair (Ab <-> Ac
), and sends one each to B (Ab => Bb
) and C (Ac => Cc
). Assuming both directions of the proxy bypass occur concurrently, we'll send a number of ObserveProxy
messages between node A and nodes B/C, eventually cleaning up the proxies in A, such that Bb
's peer is Cc
, and vice versa. During this process, we never attempt to send a message directly between nodes B and C.
In NodeController
, direct connections between a pair of nodes are established via the broker node when attempting to send a message directly between nodes, but as we have not attempted to send any messages directly, no such connection has been established. That means that when one of these child nodes dies, the other node will not be notified of the peer being lost, and the IPDL actor will appear to remain open. Only once a message is sent will the death of the peer node be discovered, and the corresponding actor destroyed.
To fix this, the patches in this bug modify the routing code, adding a couple of callbacks when accepting ports over IPC and bypassing proxies which notify NodeController
, allowing it to attempt an introduction eagerly. This helps ensure that actors will reliably be notified when their peer processes die.
In addition, some tweaks to the introduction logic were made to both make introductions happen reliably, and to ensure we clean up missing peer nodes in error conditions.
Assignee | ||
Comment 1•1 year ago
|
||
Suppose we have 3 nodes: A, B, and C. In this scenario A is the broker,
and there are two child-nodes B and C. A creates a port-pair (Ab <->
Ac), and sends one each to B (Ab => Bb) and C (Ac => Cc). Assuming both
directions of the proxy bypass occur concurrently, we'll send a number
of ObserveProxy messages between node A and nodes B/C, eventually
cleaning up the proxies in A, such that Bb's peer is Cc, and vice versa.
During this process, we never attempt to send a message directly between
nodes B and C.
In NodeController, direct connections between a pair of nodes are
established via the broker node when attempting to send a message
directly between nodes, but as we have not attempted to send any
messages directly, no such connection has been established. That means
that when one of these child nodes dies, the other node will not be
notified of the peer being lost, and the IPDL actor will appear to
remain open. Only once a message is sent will the death of the peer node
be discovered, and the corresponding actor destroyed.
To fix this, we modify the routing code, adding a couple of callbacks
when accepting ports over IPC and bypassing proxies which notify
NodeController, allowing it to attempt an introduction eagerly. This
helps ensure that actors will reliably be notified when their peer
processes die.
In addition, some tweaks to the introduction logic were made to both
make introductions happen reliably, and to ensure we clean up missing
peer nodes in error conditions.
Assignee | ||
Updated•1 year ago
|
Comment 3•10 months ago
|
||
There is an r+ patch which didn't land and no activity in this bug for 2 weeks.
:nika, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.
Updated•10 months ago
|
Assignee | ||
Updated•10 months ago
|
Comment 5•10 months ago
|
||
bugherder |
Description
•