Closed
Bug 867013
Opened 12 years ago
Closed 12 years ago
Add urgent message semantics
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla25
Tracking | Status | |
---|---|---|
firefox24 | --- | fixed |
People
(Reporter: dvander, Assigned: dvander)
References
Details
Attachments
(1 file, 2 obsolete files)
34.84 KB,
patch
|
cjones
:
review+
|
Details | Diff | Splinter Review |
Bill McCloskey and I have been trying to use CPOWs to shim addon support into multi-process Firefox. The idea is that the child process can send a message to the parent process containing a "cross-process object wrapper", and the parent process can then use JSAPI on this wrapper as if the object were in the same process.
The problem is that existing IPC semantics do not quite cover the use case we need. If the child is blocked on the parent process (for example, it called sendSyncMessage()), then it is unable to process any incoming queries from the parent.
We could attempt to fix this by using RPC semantics on both ends, but this has event ordering problems. For example, the parent can send multiple messages to the client at once. If the first of these messages causes the child to "block" with an RPC call, it will still continue to process the rest of the incoming messages. The state when the CPOW query arrives is then different from what it would be had the JSAPI call occurred in-process.
Therefore the child sometimes needs to block in such a way that *only* CPOW messages are processed. To accomplish this we introduced "urgent" messages. Urgent messages are restricted sync messages that can nest. The semantics differ in that:
* An urgent message can be sent from the parent process.
* An urgent message cannot be initiated if awaiting an rpc, sync, or urgent reply.
* If the main thread is blocked on an RPC call or synchronous reply, then any
urgent messages will be dispatched as if the main thread were not waiting.
With these semantics we are hoping to mitigate potential deadlock scenarios that would be possible if synchronous calls could simply nest outright. We're hoping that's good enough for most CPOW usage :)
So far, it seems to have worked. We've been able to shim parts of the frontend and make out-of-box Adblock + Noscript work in multiprocess Firefox. Hopefully by getting this on mozilla-central we can begin to experiment more.
The patch shouldn't affect any existing IPC/IPDL semantics, though the IPDL changes are kind of a hack. I'm currently working on getting tests written, but I'd like to throw the patch up for comments in the meantime.
Attachment #743382 -
Flags: review?(cjones.bugs)
Comment 1•12 years ago
|
||
cjones is gone, no?
Comment 2•12 years ago
|
||
As mentioned in email a few months ago, I have serious concerns about this approach. cc'ing bent, who owns IPDL/IPC now.
> We could attempt to fix this by using RPC semantics on both ends, but this
> has event ordering problems. For example, the parent can send multiple
> messages to the client at once. If the first of these messages causes the
> child to "block" with an RPC call, it will still continue to process the
> rest of the incoming messages. The state when the CPOW query arrives is then
> different from what it would be had the JSAPI call occurred in-process.
I don't understand what this means, or why it's necessarily bad. The parent is not blocked and has already sent the async messages, so its state is still going to be mutating. The only thing that really changes is when the child becomes aware of that mutation, right?
What happens if the parent sends when either the child or the parent start spinning nested event loops as a result of one of these messages? Are async messages sent from within the nested loop delivered or queued? How does that interact with the existing queue of pending async messages? The whole purpose of the existing RPC sematics was to ensure that unrelated async message traffic was delivered in order, so that people could write correctly ordered state machines (either by hand or using IPDL state machinery).
> Therefore the child sometimes needs to block in such a way that *only* CPOW
> messages are processed. To accomplish this we introduced "urgent" messages.
> Urgent messages are restricted sync messages that can nest.
The primary difference between "sync" and "rpc" in the current setup is whether they can nest, and whether there can be any intervening messages. So please don't use the term "sync". "blocking" would be better here.
> The semantics
> differ in that:
> * An urgent message can be sent from the parent process.
But not from the child? A CPOW message from the child will always be delivered using RPC semantics?
> * An urgent message cannot be initiated if awaiting an rpc, sync, or urgent
> reply.
Initiated by the receiving process, you mean? Or the sending process? Presumably the sending process is blocked anyway and won't be initiating anything. What message type can be nested inside an urgent message by the receiving process? Won't any function that takes a callback require nesting?
> * If the main thread is blocked on an RPC call or synchronous reply, then
> any
> urgent messages will be dispatched as if the main thread were not waiting.
You mean the main thread of the receiving (child) process? We will need to audit the existing sync message to make sure that they all happen at "safe points" where content code is allowed to run. I am certain that there are existing IPDL clients in necko who use sync messages specifically because it is currently not safe to run content code.
I strongly suspect that it would be a safer overall experience to at least have incoming sync messages nest within the urgentrpc messages, rather than the other way around.
> So far, it seems to have worked. We've been able to shim parts of the
> frontend and make out-of-box Adblock + Noscript work in multiprocess
> Firefox. Hopefully by getting this on mozilla-central we can begin to
> experiment more.
>
> The patch shouldn't affect any existing IPC/IPDL semantics, though the IPDL
> changes are kind of a hack. I'm currently working on getting tests written,
> but I'd like to throw the patch up for comments in the meantime.
I'm very worried about making sure that message dispatch can actually be sound before we commit a lot to it. With plugins we've had to deal with racing RPC messages and instability caused by code which assumes certain orderings for years. It's really easy to construct degenerate cases with nested event loops which are bound to break, and so we've ended up doing horrible prevention techniques like bug 785348 where if you start to spin nested loops, we neuter part of the plugin (painting!) so that we don't have exploitable security issues, but the browser doesn't work either. Since we know that content can spin nested event loops by design, "seems to have worked" isn't really a high enough bar.
I don't know what this means for getting anything landed in central off-by-default (bent will have to comment on the IPDL changes directly).
Assignee | ||
Comment 3•12 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #1)
> cjones is gone, no?
Chris has generously offered to review this for us.
Here's some more background on what we're doing. The main use case we
have for CPOWs is as follows:
- Some event triggers in the content process.
- The content process informs the parent process of the event by
sending a message and blocking on the result (in case a result is
expected from the event handler). Arguments to the event handler
that can't easily be sent over an IPC channel are sent as CPOWs.
- The parent process triggers any handlers/observers for the event,
which may be in addon code or in Firefox code itself.
- Whenever a handler tries to get or set a property on the CPOW, the
parent sends a message back to the child and waits for the result.
We're imposing a number of restrictions. First, we only allow CPOWs
from the parent into the child. The child isn't allowed to see any
parent objects. This eliminates the need for any kind of crazy
distributed garbage collection. Probably most serious effect of this
restriction is that the parent can't register callbacks on child
objects (at least not via CPOWs).
The second restriction, as David mentioned, is that we're not going to
allow the child to send out any messages while it's responding to a
CPOW request. That simplifies the possible set of behaviors, and so
far it doesn't seem like we need to send messages from the child at
that time.
We're also considering *requiring* that the child is in a known good
state before CPOWs are used. We could do that with two additional
restrictions. First, we could require the child to send CPOWs to the
parent only via some blocking send call. And second, we could "sever"
the CPOWs once that blocking call returns; the severing mechanism
would be similar to what we do to cross-compartment wrappers when we
nuke a compartment. It's not clear if we'll need these restrictions,
but we have them to fall back on if necessary.
We started out trying to use RPC messages to implement this stuff. In
theory, the child should be able to inform the parent of the event via
an RPC message, and then the parent would send any CPOW queries via
RPC as well. However, we ran into some unexpected behavior. Imagine the
parent sends these messages to the child:
1. create object named A
2. do something with A
Consider what happens if, while processing the first message, an event
handler fires in the child that causes it to send an RPC to the
parent. While it's waiting for the RPC response, the child tries to
process message 2, which fails because it hasn't finished creating
object A yet.
The specific problem we ran into was where messages 1 and 2 were about
opening a tab and running loadFrameScript. We talked a little with
Chris Jones about this issue, and he said that the current RPC
behavior is needed for NPAPI stuff. Benjamin, it sounds like maybe
that's what you were talking about with nested event loops?
In any case, the current behavior makes it difficult to use RPC
messages to do what we need. We didn't want to change the existing
semantics for RPC messages since the plugin code presumably depends
on it. Our alternative is for the child to use sendSyncMessage to
inform the parent that an event has fired, and for the child to use
urgent messages for CPOW stuff. That way, the child won't process
anything except CPOW messages while it waits for the event response.
Comment 5•12 years ago
|
||
> 1. create object named A
> 2. do something with A
> Consider what happens if, while processing the first message, an event handler
> fires in the child that causes it to send an RPC to the parent. While it's
> waiting for the RPC response, the child tries to process message 2, which
> fails because it hasn't finished creating object A yet.
Necko ran into this problem: we need to dispatch the nsIStreamListener/nsIRequestListener functions (OnStartRequest, OnDataAvailable, OnStopRequest) in serial order, but since we send them async from the parent they sit in the child's inbound IPDL queue and any rpc/sync message that happens during one of them (in our case someone using synchronous XHR from OnStartRequest) would cause the next message to get dispatched before the previous one completed. For necko we solved this by explicitly queuing the IPDL messages: the code is here in case it's useful to you:
http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/ChannelEventQueue.h
(I was never clear as to why we couldn't just have IPDL provide a "no reentrancy within a protocol instance" guarantee internally, at least on an opt-in basis. It would be more efficient as we have to copy msgs to queue them.)
Of course the necko case is pretty easy--we just need a particular channel's Start/Data/Stop to be in order. It sounds like CPOWs might need to have some guarantee that different CPOWs are all started and finished serially?
> I am certain that there are existing IPDL clients in necko who use sync
> messages specifically because it is currently not safe to run content code.
I don't think that's right. We use only async msgs in necko except for cookies, which uses a sync message for GetCookie (used IIRC only for document.cookie reads).
We discussed this over the phone. While I would like to migrate rpc semantics to the more sensical approach, (i) I don't remember what motivated us to choose the current semantics in the first place; (ii) way too scary wrt NPAPI. We discussed implementing an additional semantics, but that's a large amount of work, and the impact of the issue atm seems pretty small in scope. And there's a patch in hand here.
Sorry dvander, I got wrapped up in some other things today. Review here is the top of my list.
Comment on attachment 743382 [details] [diff] [review]
v1
Make sure you're building with
ac_add_options --enable-ipdl-tests
The C++ tests are failing with this patch. You can run them with
make -C $objdir/ipc/ipdl/test/cxx check
They hung at TestSyncError 3/3 runs with this patch applied, and completed successfully 3/3 times without it.
This patch also really needs tests. ipc/ipdl/test/README.txt briefly explains how the two testsuites (compiler-only and C++) work. You can run all the tests with
make -C $objdir/ipc/ipdl/test check
The test suites aren't rocket science, should be able to cargo-cult your way in. Feel free to ping me if you get stuck.
r-'ing based on that, but will have comments about the substantial content of the patch in a bit.
Attachment #743382 -
Flags: review?(cjones.bugs) → review-
Comment on attachment 743382 [details] [diff] [review]
v1
>diff --git a/ipc/glue/RPCChannel.cpp b/ipc/glue/RPCChannel.cpp
> bool
> RPCChannel::Call(Message* _msg, Message* reply)
> {
>+ NS_ABORT_IF_FALSE(!mPendingReply, "cannot nest sync");
>+ if (mPendingReply)
>+ return false;
This should be an assert.
> nsAutoPtr<Message> msg(_msg);
> AssertWorkerThread();
> mMonitor->AssertNotCurrentThreadOwns();
>- RPC_ASSERT(!ProcessingSyncMessage(),
>- "violation of sync handler invariant");
I don't quite understand why this is being removed.
>@@ -188,45 +193,58 @@ RPCChannel::Call(Message* _msg, Message*
>+ else if (!mUrgent.empty()) {
>+ recvd = mUrgent.front();
>+ mUrgent.pop_front();
>+ }
I'm not sure it matters, but we need to think about whether we should
process out-of-turn replies before urgent messages. My intuition is
that it doesn't matter, but I'm not sure.
>@@ -373,21 +390,26 @@ RPCChannel::OnMaybeDequeueOne()
>+ MessageQueue *queue = mUrgent.empty()
>+ ? mPending.empty()
>+ ? &mNonUrgentDeferred
>+ : &mPending
>+ : &mUrgent;
Shouldn't we flush the non-urgent deferred queue before the pending
queue? Actually, it's not clear to me that those queues won't
overlap, by which I mean there could be both older and newer messages
in mPending than in mNonUrgentDeferred. We need to think about so as
not to trash in-order delivery :).
>@@ -693,16 +715,26 @@ RPCChannel::OnMessageReceivedFromLink(co
>+ // Urgent messages must be delivered immediately.
>+ if (msg.priority() == IPC::Message::PRIORITY_HIGH) {
>+ mUrgent.push_back(msg);
>+ if (!AwaitingSyncReply() && (0 == StackDepth() && !mBlockedOnParent))
>+ mWorkerLoop->PostTask(FROM_HERE, new DequeueTask(mDequeueOneTask));
>+ else
>+ NotifyWorkerThread();
>+ return;
I don't quite understanding the logic of duplicating this code
vs. choosing the queue based on priority, below.
>diff --git a/ipc/glue/RPCChannel.h b/ipc/glue/RPCChannel.h
>+ // List of async and sync messages that have been received while waiting
>+ // for an urgent reply, that need to be deferred until that reply has been
>+ // received.
>+ MessageQueue mNonUrgentDeferred;
These can be rpc as well, right?
>diff --git a/ipc/glue/SyncChannel.cpp b/ipc/glue/SyncChannel.cpp
The original goal was to "layer" messaging semantics on each other,
async < sync < rpc, but this code has gotten cross-invasive enough
that that's turned out to be a bad idea. Sorry about that.
> bool
> SyncChannel::Send(Message* _msg, Message* reply)
> {
>+ NS_ABORT_IF_FALSE(!mPendingReply, "cannot nest sync");
>+ if (mPendingReply)
>+ return false;
Again, I'd prefer this remain an assertion unless there's a compelling
reason otherwise.
I need to come back and take another pass over the SyncChannel
changes, but I think there's enough to discuss in the meantime.
>diff --git a/ipc/ipdl/ipdl/ast.py b/ipc/ipdl/ipdl/ast.py
The compiler changes look fine to me.
It strikes me that if
> We started out trying to use RPC messages to implement this stuff. In
> theory, the child should be able to inform the parent of the event via
> an RPC message, and then the parent would send any CPOW queries via
> RPC as well. However, we ran into some unexpected behavior. Imagine the
> parent sends these messages to the child:
> 1. create object named A
> 2. do something with A
> Consider what happens if, while processing the first message, an event
> handler fires in the child that causes it to send an RPC to the
> parent. While it's waiting for the RPC response, the child tries to
> process message 2, which fails because it hasn't finished creating
> object A yet.
is the only problem you're running into, that is rpc <--> rpc race between parent/child, then you can use RPC race resolution to make sure the parent "wins" for the messages you care about. The example is here
http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginMessageUtils.cpp#70
However, race resoluton doesn't work for async <--> rpc or sync <--> rpc races, since messaging semantics dictate how those should be resolved.
Certainly worth a try!
Comment on attachment 743382 [details] [diff] [review]
v1
>diff --git a/ipc/glue/SyncChannel.cpp b/ipc/glue/SyncChannel.cpp
>+#include "mozilla/ipc/RPCChannel.h"
Ew :).
>
> #include "nsDebug.h"
> #include "nsTraceRefcnt.h"
>
> using mozilla::MonitorAutoLock;
>
> template<>
> struct RunnableMethodTraits<mozilla::ipc::SyncChannel>
>@@ -57,22 +58,59 @@ bool SyncChannel::sIsPumpingMessages = f
>
> bool
> SyncChannel::EventOccurred()
> {
> AssertWorkerThread();
> mMonitor->AssertCurrentThreadOwns();
> NS_ABORT_IF_FALSE(AwaitingSyncReply(), "not in wait loop");
>
>- return (!Connected() || 0 != mRecvd.type() || mRecvd.is_reply_error());
>+ return !Connected() ||
>+ mRecvd.type() != 0 ||
>+ mRecvd.is_reply_error() ||
>+ !mUrgent.empty();
>+}
>+
>+bool
>+SyncChannel::ProcessUrgentMessages()
>+{
>+ while (!mUrgent.empty()) {
>+ Message msg = mUrgent.front();
>+ mUrgent.pop_front();
>+
>+ MOZ_ASSERT(msg.priority() == IPC::Message::PRIORITY_HIGH);
>+
>+ {
>+ MOZ_ASSERT(msg.is_sync() || msg.is_rpc());
>+
>+ MonitorAutoUnlock unlock(*mMonitor);
>+ SyncChannel::OnDispatchMessage(msg);
>+ }
>+
>+ // Check state that could have been changed during dispatch.
>+ if (!Connected()) {
>+ ReportConnectionError("SyncChannel");
>+ return false;
>+ }
>+
>+ // We should not have received another synchronous reply,
>+ // because we cannot send synchronous messages in this state.
>+ MOZ_ASSERT(mRecvd.type() == 0);
>+ }
>+
>+ return true;
> }
>
> bool
> SyncChannel::Send(Message* _msg, Message* reply)
> {
>+ NS_ABORT_IF_FALSE(!mPendingReply, "cannot nest sync");
>+ if (mPendingReply)
>+ return false;
Again, should be an assertion.
>@@ -89,72 +127,104 @@ SyncChannel::Send(Message* _msg, Message
> while (1) {
>- bool maybeTimedOut = !SyncChannel::WaitForNotify();
>+ // if a handler invoked by *Dispatch*() spun a nested event
>+ // loop, and the connection was broken during that loop, we
>+ // might have already processed the OnError event. if so,
>+ // trying another loop iteration will be futile because
>+ // channel state will have been cleared
>+ if (!Connected()) {
Please write a test for this case if you can.
>+ // Process all urgent messages. We forbid nesting synchronous sends,
>+ // so mPendingReply etc will still be valid.
>+ if (!ProcessUrgentMessages())
>+ return false;
I'm starting to get the feeling that we should do away with the
SyncChannel/RPCChannel distinction, since the wait loops look so
similar now. That should make the changes in this patch easier to
understand and maintain.
I can't say I like the SyncChannel changes, since we're importing more
logic from RPCChannel, but I think I understand them and I think they
look OK. But let's see if we can get away with RPC race resolution.
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> Comment on attachment 743382 [details] [diff] [review]
> v1
>
> Make sure you're building with
>
> ac_add_options --enable-ipdl-tests
>
> The C++ tests are failing with this patch. You can run them with
>
> make -C $objdir/ipc/ipdl/test/cxx check
>
> They hung at TestSyncError 3/3 runs with this patch applied, and completed
> successfully 3/3 times without it.
>
> This patch also really needs tests. ipc/ipdl/test/README.txt briefly
> explains how the two testsuites (compiler-only and C++) work. You can run
> all the tests with
>
> make -C $objdir/ipc/ipdl/test check
>
> The test suites aren't rocket science, should be able to cargo-cult your way
> in. Feel free to ping me if you get stuck.
>
> r-'ing based on that, but will have comments about the substantial content
> of the patch in a bit.
Okay, I fixed that test, thanks. I'll have tests for the next patch.
Assignee | ||
Comment 13•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #9)
> >@@ -188,45 +193,58 @@ RPCChannel::Call(Message* _msg, Message*
>
> >+ else if (!mUrgent.empty()) {
> >+ recvd = mUrgent.front();
> >+ mUrgent.pop_front();
> >+ }
>
> I'm not sure it matters, but we need to think about whether we should
> process out-of-turn replies before urgent messages. My intuition is
> that it doesn't matter, but I'm not sure.
Right now we do process out-of-turn replies first. I don't think it matters, at least not yet.
> >@@ -373,21 +390,26 @@ RPCChannel::OnMaybeDequeueOne()
>
> >+ MessageQueue *queue = mUrgent.empty()
> >+ ? mPending.empty()
> >+ ? &mNonUrgentDeferred
> >+ : &mPending
> >+ : &mUrgent;
>
> Shouldn't we flush the non-urgent deferred queue before the pending
> queue? Actually, it's not clear to me that those queues won't
> overlap, by which I mean there could be both older and newer messages
> in mPending than in mNonUrgentDeferred. We need to think about so as
> not to trash in-order delivery :).
Hrm, good point. I'm still thinking about this.
> The original goal was to "layer" messaging semantics on each other,
> async < sync < rpc, but this code has gotten cross-invasive enough
> that that's turned out to be a bad idea. Sorry about that.
Yeah, I agree, Sync/RPC and maybe even the Async code should be all merged together someday. I might attempt that if nesting urgent messages gets too complicated.
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #11)
> > while (1) {
> >- bool maybeTimedOut = !SyncChannel::WaitForNotify();
> >+ // if a handler invoked by *Dispatch*() spun a nested event
> >+ // loop, and the connection was broken during that loop, we
> >+ // might have already processed the OnError event. if so,
> >+ // trying another loop iteration will be futile because
> >+ // channel state will have been cleared
> >+ if (!Connected()) {
>
> Please write a test for this case if you can.
This is pre-existing code, it just got indented.
> I can't say I like the SyncChannel changes, since we're importing more
> logic from RPCChannel, but I think I understand them and I think they
> look OK. But let's see if we can get away with RPC race resolution.
We were encountering ordering problems with async messages coming from Chrome. If I'm understanding it correctly - the RPC race resolution would only work if we changed those messages to be rpc?
(In reply to David Anderson [:dvander] from comment #14)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #11)
>
> > I can't say I like the SyncChannel changes, since we're importing more
> > logic from RPCChannel, but I think I understand them and I think they
> > look OK. But let's see if we can get away with RPC race resolution.
>
> We were encountering ordering problems with async messages coming from
> Chrome. If I'm understanding it correctly - the RPC race resolution would
> only work if we changed those messages to be rpc?
Correct, rpc doesn't "race" with async (or with sync), semantically. OK then.
Assignee | ||
Comment 16•12 years ago
|
||
Addresses review comments, and adds some tests for urgent messages.
re: mPending versus mNonUrgentDeferred and order delivery problems. You're right I flipped the order the queues are resolved in, thanks. Whether the queues can still be out-of-order...
RPCChannel::OnMessageReceivedFromLink will fill mPending with any message that's not a reply. RPCChannel::Call()'s wait loop will then pop mPending, and either immediately process it or move it into mNonUrgentDeferred. The question is whether we can add to mPending, without moving to mNonUrgentDeferred, or without processing mNonUrgentDeferred first.
We don't allow anything to nest on top of urgent messages, so we shouldn't be spinning up nested event loops or anything. Even once we do nest urgent messages (on top of urgent messages only), as long as no rpc/sync stuff can happen on top, it should be okay.
I don't know enough to say for sure so I'll try asserting to that affect.
Attachment #743382 -
Attachment is obsolete: true
Attachment #755133 -
Flags: review?(cjones.bugs)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #9)
> >diff --git a/ipc/glue/RPCChannel.h b/ipc/glue/RPCChannel.h
>
> >+ // List of async and sync messages that have been received while waiting
> >+ // for an urgent reply, that need to be deferred until that reply has been
> >+ // received.
> >+ MessageQueue mNonUrgentDeferred;
>
> These can be rpc as well, right?
They cannot.
Comment on attachment 755133 [details] [diff] [review]
v2
>diff --git a/ipc/glue/RPCChannel.cpp b/ipc/glue/RPCChannel.cpp
> bool
> RPCChannel::Call(Message* _msg, Message* reply)
> {
>+ MOZ_ASSERT(!mPendingReply);
Nit-ish: Please use |RPC_ASSERT(!mPendingReply)|, which also logs info
about channel state like pseudo-stack.
>@@ -161,45 +166,58 @@ RPCChannel::Call(Message* _msg, Message*
>+ if (urgent && recvd.priority() != IPC::Message::PRIORITY_HIGH) {
>+ // If we're waiting for an urgent reply, don't process any
>+ // messages yet.
>+ mNonUrgentDeferred.push_back(recvd);
>+ } else {
>+ MonitorAutoUnlock unlock(*mMonitor);
>+ CxxStackFrame f(*this, IN_MESSAGE, &recvd);
>+ AsyncChannel::OnDispatchMessage(recvd);
>+ }
> continue;
> }
>
> if (recvd.is_sync()) {
>+ if (urgent && recvd.priority() != IPC::Message::PRIORITY_HIGH) {
>+ // If we're waiting for an urgent reply, don't process any
>+ // messages yet.
>+ mNonUrgentDeferred.push_back(recvd);
>+ } else {
>+ RPC_ASSERT(mPending.empty(),
>+ "other side should have been blocked");
>+ MonitorAutoUnlock unlock(*mMonitor);
>+ CxxStackFrame f(*this, IN_MESSAGE, &recvd);
>+ SyncChannel::OnDispatchMessage(recvd);
>+ }
Nit-ish: I'd prefer to avoid the code duplication here with
if (!recvd.is_rpc()) {
if (urgent && ...) {
...
} else if(!recvd.is_sync()) {
...
} else {
...
}
continue;
}
>- if (0 == StackDepth()) {
>- // the worker thread might be idle, make sure it wakes up
>+ // There are three cases we're concerned about, relating to the state of
>+ // the main thread:
>+ //
>+ // (1) We are waiting on a sync reply - main thread is blocked on the IPC monitor.
>+ // - If the message is high priority, we wake up the main thread to
>+ // deliver the message. Otherwise, we leave it in the mPending queue,
>+ // posting a task to the main event loop, where it will be processed
>+ // once the synchronous reply has been received.
>+ //
>+ // (2) We are waiting on an RPC reply - main thread is blocked on the IPC monitor.
>+ // - Always wake up the main thread to deliver the message.
>+ //
>+ // (3) We are not waiting on a reply.
>+ // - We post a task to the main event loop.
>+ //
>+ bool waiting_rpc = (0 != StackDepth());
>+ bool urgent = (msg.priority() == IPC::Message::PRIORITY_HIGH);
>+ if (!waiting_rpc && !(AwaitingSyncReply() && urgent)) {
A subtlety here that I had to reassure myself of is an outgoing sync
message racing with incoming RPC. In that case,
!waiting_rpc = true
(AwaitingSyncReply() && urgent) = false
so we would end up enqueuing the RPC for the main event-loop and not
wake up the SyncChannel waiter. That might seem wrong, but it's
actually the correct way to resolve that semantic race.
With that in mind, it might be clearer to flip the bits we check here
if (waiting_rpc || (AwaitingSyncReply() && urgent)) {
// Always wake up our rpc waiter, and wake up sync waiters for
// urgent messages.
Notify();
} else {
// Worker thread is either not blocked on a reply, or this is an
// incoming RPC that raced with outgoing sync and needs to be
// deferred to a later event-loop iteration.
//...
}
>diff --git a/ipc/ipdl/test/cxx/Makefile.in b/ipc/ipdl/test/cxx/Makefile.in
> TestBadActor \
>+ TestUrgency \
Nit: tab.
> $(NULL)
>
> ifeq ($(OS_ARCH),Linux)
> IPDLTESTS += TestSysVShmem
> endif
>
> EXTRA_PROTOCOLS = \
> TestBridgeSub \
>diff --git a/ipc/ipdl/test/cxx/PTestUrgency.ipdl b/ipc/ipdl/test/cxx/PTestUrgency.ipdl
>+enum {
>+ kFirstTestBegin = 1,
>+ kFirstTestGotReply,
>+ kSecondTestBegin,
>+ kSecondTestGotReply,
>+};
Would be nice to write a protocol state machine for this test and
replace this with asserts over actor |state()|, but as state machines
are currently spec'd, that gets pretty hard with message races (bug
547703 and followups).
OK, this is looking good. These are mostly just stylistic changes,
but I'd like to take one more pass before r+. Still afraid of missing
something :).
Attachment #755133 -
Flags: review?(cjones.bugs)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #18)
> With that in mind, it might be clearer to flip the bits we check here
>
> if (waiting_rpc || (AwaitingSyncReply() && urgent)) {
> // Always wake up our rpc waiter, and wake up sync waiters for
> // urgent messages.
> Notify();
> } else {
> // Worker thread is either not blocked on a reply, or this is an
> // incoming RPC that raced with outgoing sync and needs to be
> // deferred to a later event-loop iteration.
> //...
> }
I meant to add, but please go with whatever is clearer to you.
Assignee | ||
Comment 20•12 years ago
|
||
Nits applied - I agree the inverted logic is easier to read. Getting those conditions right was the most difficult part of this, TBH.
I didn't have a chance to do the protocol state machine yet, but wanted to make sure the rest of the patch was in a landable state since I'm on PTO this week.
Attachment #755133 -
Attachment is obsolete: true
Attachment #760294 -
Flags: review?(cjones.bugs)
(In reply to David Anderson [:dvander] from comment #20)
> Created attachment 760294 [details] [diff] [review]
> v3
>
> Nits applied - I agree the inverted logic is easier to read. Getting those
> conditions right was the most difficult part of this, TBH.
Yeah, it was also the hardest to review :).
> I didn't have a chance to do the protocol state machine yet,
I would skip that. I think all you would gain from that is learning what the limitations in the current system are ;).
> but wanted to
> make sure the rest of the patch was in a landable state since I'm on PTO
> this week.
D'oh, I started traveling the day you sent this :(. Sorry for lag.
Updated•12 years ago
|
Attachment #760294 -
Flags: review?(cjones.bugs) → review+
Assignee | ||
Comment 22•12 years ago
|
||
Comment 23•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Comment 24•12 years ago
|
||
I ended up having to backport this:
https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?changeset=3266c1d73816
status-firefox24:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•