867013 - Add urgent message semantics

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Description

•

12 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

Bill McCloskey and I have been trying to use CPOWs to shim addon support into multi-process Firefox. The idea is that the child process can send a message to the parent process containing a "cross-process object wrapper", and the parent process can then use JSAPI on this wrapper as if the object were in the same process. The problem is that existing IPC semantics do not quite cover the use case we need. If the child is blocked on the parent process (for example, it called sendSyncMessage()), then it is unable to process any incoming queries from the parent. We could attempt to fix this by using RPC semantics on both ends, but this has event ordering problems. For example, the parent can send multiple messages to the client at once. If the first of these messages causes the child to "block" with an RPC call, it will still continue to process the rest of the incoming messages. The state when the CPOW query arrives is then different from what it would be had the JSAPI call occurred in-process. Therefore the child sometimes needs to block in such a way that *only* CPOW messages are processed. To accomplish this we introduced "urgent" messages. Urgent messages are restricted sync messages that can nest. The semantics differ in that: * An urgent message can be sent from the parent process. * An urgent message cannot be initiated if awaiting an rpc, sync, or urgent reply. * If the main thread is blocked on an RPC call or synchronous reply, then any urgent messages will be dispatched as if the main thread were not waiting. With these semantics we are hoping to mitigate potential deadlock scenarios that would be possible if synchronous calls could simply nest outright. We're hoping that's good enough for most CPOW usage :) So far, it seems to have worked. We've been able to shim parts of the frontend and make out-of-box Adblock + Noscript work in multiprocess Firefox. Hopefully by getting this on mozilla-central we can begin to experiment more. The patch shouldn't affect any existing IPC/IPDL semantics, though the IPDL changes are kind of a hack. I'm currently working on getting tests written, but I'd like to throw the patch up for comments in the meantime.

Attachment #743382 - Flags: review?(cjones.bugs)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

12 years ago

Blocks: 853209

(PTO June 19-July 7) Ryan VanderMeulen

Comment 1

•

12 years ago

cjones is gone, no?

Benjamin Smedberg

Comment 2

•

12 years ago

As mentioned in email a few months ago, I have serious concerns about this approach. cc'ing bent, who owns IPDL/IPC now. > We could attempt to fix this by using RPC semantics on both ends, but this > has event ordering problems. For example, the parent can send multiple > messages to the client at once. If the first of these messages causes the > child to "block" with an RPC call, it will still continue to process the > rest of the incoming messages. The state when the CPOW query arrives is then > different from what it would be had the JSAPI call occurred in-process. I don't understand what this means, or why it's necessarily bad. The parent is not blocked and has already sent the async messages, so its state is still going to be mutating. The only thing that really changes is when the child becomes aware of that mutation, right? What happens if the parent sends when either the child or the parent start spinning nested event loops as a result of one of these messages? Are async messages sent from within the nested loop delivered or queued? How does that interact with the existing queue of pending async messages? The whole purpose of the existing RPC sematics was to ensure that unrelated async message traffic was delivered in order, so that people could write correctly ordered state machines (either by hand or using IPDL state machinery). > Therefore the child sometimes needs to block in such a way that *only* CPOW > messages are processed. To accomplish this we introduced "urgent" messages. > Urgent messages are restricted sync messages that can nest. The primary difference between "sync" and "rpc" in the current setup is whether they can nest, and whether there can be any intervening messages. So please don't use the term "sync". "blocking" would be better here. > The semantics > differ in that: > * An urgent message can be sent from the parent process. But not from the child? A CPOW message from the child will always be delivered using RPC semantics? > * An urgent message cannot be initiated if awaiting an rpc, sync, or urgent > reply. Initiated by the receiving process, you mean? Or the sending process? Presumably the sending process is blocked anyway and won't be initiating anything. What message type can be nested inside an urgent message by the receiving process? Won't any function that takes a callback require nesting? > * If the main thread is blocked on an RPC call or synchronous reply, then > any > urgent messages will be dispatched as if the main thread were not waiting. You mean the main thread of the receiving (child) process? We will need to audit the existing sync message to make sure that they all happen at "safe points" where content code is allowed to run. I am certain that there are existing IPDL clients in necko who use sync messages specifically because it is currently not safe to run content code. I strongly suspect that it would be a safer overall experience to at least have incoming sync messages nest within the urgentrpc messages, rather than the other way around. > So far, it seems to have worked. We've been able to shim parts of the > frontend and make out-of-box Adblock + Noscript work in multiprocess > Firefox. Hopefully by getting this on mozilla-central we can begin to > experiment more. > > The patch shouldn't affect any existing IPC/IPDL semantics, though the IPDL > changes are kind of a hack. I'm currently working on getting tests written, > but I'd like to throw the patch up for comments in the meantime. I'm very worried about making sure that message dispatch can actually be sound before we commit a lot to it. With plugins we've had to deal with racing RPC messages and instability caused by code which assumes certain orderings for years. It's really easy to construct degenerate cases with nested event loops which are bound to break, and so we've ended up doing horrible prevention techniques like bug 785348 where if you start to spin nested loops, we neuter part of the plugin (painting!) so that we don't have exploitable security issues, but the browser doesn't work either. Since we know that content can spin nested event loops by design, "seems to have worked" isn't really a high enough bar. I don't know what this means for getting anything landed in central off-by-default (bent will have to comment on the IPDL changes directly).

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 3

•

12 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #1) > cjones is gone, no? Chris has generously offered to review this for us.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 4

•

12 years ago

Here's some more background on what we're doing. The main use case we have for CPOWs is as follows: - Some event triggers in the content process. - The content process informs the parent process of the event by sending a message and blocking on the result (in case a result is expected from the event handler). Arguments to the event handler that can't easily be sent over an IPC channel are sent as CPOWs. - The parent process triggers any handlers/observers for the event, which may be in addon code or in Firefox code itself. - Whenever a handler tries to get or set a property on the CPOW, the parent sends a message back to the child and waits for the result. We're imposing a number of restrictions. First, we only allow CPOWs from the parent into the child. The child isn't allowed to see any parent objects. This eliminates the need for any kind of crazy distributed garbage collection. Probably most serious effect of this restriction is that the parent can't register callbacks on child objects (at least not via CPOWs). The second restriction, as David mentioned, is that we're not going to allow the child to send out any messages while it's responding to a CPOW request. That simplifies the possible set of behaviors, and so far it doesn't seem like we need to send messages from the child at that time. We're also considering *requiring* that the child is in a known good state before CPOWs are used. We could do that with two additional restrictions. First, we could require the child to send CPOWs to the parent only via some blocking send call. And second, we could "sever" the CPOWs once that blocking call returns; the severing mechanism would be similar to what we do to cross-compartment wrappers when we nuke a compartment. It's not clear if we'll need these restrictions, but we have them to fall back on if necessary. We started out trying to use RPC messages to implement this stuff. In theory, the child should be able to inform the parent of the event via an RPC message, and then the parent would send any CPOW queries via RPC as well. However, we ran into some unexpected behavior. Imagine the parent sends these messages to the child: 1. create object named A 2. do something with A Consider what happens if, while processing the first message, an event handler fires in the child that causes it to send an RPC to the parent. While it's waiting for the RPC response, the child tries to process message 2, which fails because it hasn't finished creating object A yet. The specific problem we ran into was where messages 1 and 2 were about opening a tab and running loadFrameScript. We talked a little with Chris Jones about this issue, and he said that the current RPC behavior is needed for NPAPI stuff. Benjamin, it sounds like maybe that's what you were talking about with nested event loops? In any case, the current behavior makes it difficult to use RPC messages to do what we need. We didn't want to change the existing semantics for RPC messages since the plugin code presumably depends on it. Our alternative is for the child to use sendSyncMessage to inform the parent that an event has fired, and for the child to use urgent messages for CPOW stuff. That way, the child won't process anything except CPOW messages while it waits for the event response.

Jason Duell

Comment 5

•

12 years ago

> 1. create object named A > 2. do something with A > Consider what happens if, while processing the first message, an event handler > fires in the child that causes it to send an RPC to the parent. While it's > waiting for the RPC response, the child tries to process message 2, which > fails because it hasn't finished creating object A yet. Necko ran into this problem: we need to dispatch the nsIStreamListener/nsIRequestListener functions (OnStartRequest, OnDataAvailable, OnStopRequest) in serial order, but since we send them async from the parent they sit in the child's inbound IPDL queue and any rpc/sync message that happens during one of them (in our case someone using synchronous XHR from OnStartRequest) would cause the next message to get dispatched before the previous one completed. For necko we solved this by explicitly queuing the IPDL messages: the code is here in case it's useful to you: http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/ChannelEventQueue.h (I was never clear as to why we couldn't just have IPDL provide a "no reentrancy within a protocol instance" guarantee internally, at least on an opt-in basis. It would be more efficient as we have to copy msgs to queue them.) Of course the necko case is pretty easy--we just need a particular channel's Start/Data/Stop to be in order. It sounds like CPOWs might need to have some guarantee that different CPOWs are all started and finished serially? > I am certain that there are existing IPDL clients in necko who use sync > messages specifically because it is currently not safe to run content code. I don't think that's right. We use only async msgs in necko except for cookies, which uses a sync message for GetCookie (used IIRC only for document.cookie reads).

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 6

•

12 years ago

We discussed this over the phone. While I would like to migrate rpc semantics to the more sensical approach, (i) I don't remember what motivated us to choose the current semantics in the first place; (ii) way too scary wrt NPAPI. We discussed implementing an additional semantics, but that's a large amount of work, and the impact of the issue atm seems pretty small in scope. And there's a patch in hand here.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 7

•

12 years ago

Sorry dvander, I got wrapped up in some other things today. Review here is the top of my list.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 8

•

12 years ago

Comment on attachment 743382 [details] [diff] [review] v1 Make sure you're building with ac_add_options --enable-ipdl-tests The C++ tests are failing with this patch. You can run them with make -C $objdir/ipc/ipdl/test/cxx check They hung at TestSyncError 3/3 runs with this patch applied, and completed successfully 3/3 times without it. This patch also really needs tests. ipc/ipdl/test/README.txt briefly explains how the two testsuites (compiler-only and C++) work. You can run all the tests with make -C $objdir/ipc/ipdl/test check The test suites aren't rocket science, should be able to cargo-cult your way in. Feel free to ping me if you get stuck. r-'ing based on that, but will have comments about the substantial content of the patch in a bit.

Attachment #743382 - Flags: review?(cjones.bugs) → review-

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 9

•

12 years ago

Comment on attachment 743382 [details] [diff] [review] v1 >diff --git a/ipc/glue/RPCChannel.cpp b/ipc/glue/RPCChannel.cpp > bool > RPCChannel::Call(Message* _msg, Message* reply) > { >+ NS_ABORT_IF_FALSE(!mPendingReply, "cannot nest sync"); >+ if (mPendingReply) >+ return false; This should be an assert. > nsAutoPtr<Message> msg(_msg); > AssertWorkerThread(); > mMonitor->AssertNotCurrentThreadOwns(); >- RPC_ASSERT(!ProcessingSyncMessage(), >- "violation of sync handler invariant"); I don't quite understand why this is being removed. >@@ -188,45 +193,58 @@ RPCChannel::Call(Message* _msg, Message* >+ else if (!mUrgent.empty()) { >+ recvd = mUrgent.front(); >+ mUrgent.pop_front(); >+ } I'm not sure it matters, but we need to think about whether we should process out-of-turn replies before urgent messages. My intuition is that it doesn't matter, but I'm not sure. >@@ -373,21 +390,26 @@ RPCChannel::OnMaybeDequeueOne() >+ MessageQueue *queue = mUrgent.empty() >+ ? mPending.empty() >+ ? &mNonUrgentDeferred >+ : &mPending >+ : &mUrgent; Shouldn't we flush the non-urgent deferred queue before the pending queue? Actually, it's not clear to me that those queues won't overlap, by which I mean there could be both older and newer messages in mPending than in mNonUrgentDeferred. We need to think about so as not to trash in-order delivery :). >@@ -693,16 +715,26 @@ RPCChannel::OnMessageReceivedFromLink(co >+ // Urgent messages must be delivered immediately. >+ if (msg.priority() == IPC::Message::PRIORITY_HIGH) { >+ mUrgent.push_back(msg); >+ if (!AwaitingSyncReply() && (0 == StackDepth() && !mBlockedOnParent)) >+ mWorkerLoop->PostTask(FROM_HERE, new DequeueTask(mDequeueOneTask)); >+ else >+ NotifyWorkerThread(); >+ return; I don't quite understanding the logic of duplicating this code vs. choosing the queue based on priority, below. >diff --git a/ipc/glue/RPCChannel.h b/ipc/glue/RPCChannel.h >+ // List of async and sync messages that have been received while waiting >+ // for an urgent reply, that need to be deferred until that reply has been >+ // received. >+ MessageQueue mNonUrgentDeferred; These can be rpc as well, right? >diff --git a/ipc/glue/SyncChannel.cpp b/ipc/glue/SyncChannel.cpp The original goal was to "layer" messaging semantics on each other, async < sync < rpc, but this code has gotten cross-invasive enough that that's turned out to be a bad idea. Sorry about that. > bool > SyncChannel::Send(Message* _msg, Message* reply) > { >+ NS_ABORT_IF_FALSE(!mPendingReply, "cannot nest sync"); >+ if (mPendingReply) >+ return false; Again, I'd prefer this remain an assertion unless there's a compelling reason otherwise. I need to come back and take another pass over the SyncChannel changes, but I think there's enough to discuss in the meantime. >diff --git a/ipc/ipdl/ipdl/ast.py b/ipc/ipdl/ipdl/ast.py The compiler changes look fine to me.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 10

•

12 years ago

It strikes me that if > We started out trying to use RPC messages to implement this stuff. In > theory, the child should be able to inform the parent of the event via > an RPC message, and then the parent would send any CPOW queries via > RPC as well. However, we ran into some unexpected behavior. Imagine the > parent sends these messages to the child: > 1. create object named A > 2. do something with A > Consider what happens if, while processing the first message, an event > handler fires in the child that causes it to send an RPC to the > parent. While it's waiting for the RPC response, the child tries to > process message 2, which fails because it hasn't finished creating > object A yet. is the only problem you're running into, that is rpc <--> rpc race between parent/child, then you can use RPC race resolution to make sure the parent "wins" for the messages you care about. The example is here http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginMessageUtils.cpp#70 However, race resoluton doesn't work for async <--> rpc or sync <--> rpc races, since messaging semantics dictate how those should be resolved. Certainly worth a try!

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 11

•

12 years ago

Comment on attachment 743382 [details] [diff] [review] v1 >diff --git a/ipc/glue/SyncChannel.cpp b/ipc/glue/SyncChannel.cpp >+#include "mozilla/ipc/RPCChannel.h" Ew :). > > #include "nsDebug.h" > #include "nsTraceRefcnt.h" > > using mozilla::MonitorAutoLock; > > template<> > struct RunnableMethodTraits<mozilla::ipc::SyncChannel> >@@ -57,22 +58,59 @@ bool SyncChannel::sIsPumpingMessages = f > > bool > SyncChannel::EventOccurred() > { > AssertWorkerThread(); > mMonitor->AssertCurrentThreadOwns(); > NS_ABORT_IF_FALSE(AwaitingSyncReply(), "not in wait loop"); > >- return (!Connected() || 0 != mRecvd.type() || mRecvd.is_reply_error()); >+ return !Connected() || >+ mRecvd.type() != 0 || >+ mRecvd.is_reply_error() || >+ !mUrgent.empty(); >+} >+ >+bool >+SyncChannel::ProcessUrgentMessages() >+{ >+ while (!mUrgent.empty()) { >+ Message msg = mUrgent.front(); >+ mUrgent.pop_front(); >+ >+ MOZ_ASSERT(msg.priority() == IPC::Message::PRIORITY_HIGH); >+ >+ { >+ MOZ_ASSERT(msg.is_sync() || msg.is_rpc()); >+ >+ MonitorAutoUnlock unlock(*mMonitor); >+ SyncChannel::OnDispatchMessage(msg); >+ } >+ >+ // Check state that could have been changed during dispatch. >+ if (!Connected()) { >+ ReportConnectionError("SyncChannel"); >+ return false; >+ } >+ >+ // We should not have received another synchronous reply, >+ // because we cannot send synchronous messages in this state. >+ MOZ_ASSERT(mRecvd.type() == 0); >+ } >+ >+ return true; > } > > bool > SyncChannel::Send(Message* _msg, Message* reply) > { >+ NS_ABORT_IF_FALSE(!mPendingReply, "cannot nest sync"); >+ if (mPendingReply) >+ return false; Again, should be an assertion. >@@ -89,72 +127,104 @@ SyncChannel::Send(Message* _msg, Message > while (1) { >- bool maybeTimedOut = !SyncChannel::WaitForNotify(); >+ // if a handler invoked by *Dispatch*() spun a nested event >+ // loop, and the connection was broken during that loop, we >+ // might have already processed the OnError event. if so, >+ // trying another loop iteration will be futile because >+ // channel state will have been cleared >+ if (!Connected()) { Please write a test for this case if you can. >+ // Process all urgent messages. We forbid nesting synchronous sends, >+ // so mPendingReply etc will still be valid. >+ if (!ProcessUrgentMessages()) >+ return false; I'm starting to get the feeling that we should do away with the SyncChannel/RPCChannel distinction, since the wait loops look so similar now. That should make the changes in this patch easier to understand and maintain. I can't say I like the SyncChannel changes, since we're importing more logic from RPCChannel, but I think I understand them and I think they look OK. But let's see if we can get away with RPC race resolution.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 12

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #8) > Comment on attachment 743382 [details] [diff] [review] > v1 > > Make sure you're building with > > ac_add_options --enable-ipdl-tests > > The C++ tests are failing with this patch. You can run them with > > make -C $objdir/ipc/ipdl/test/cxx check > > They hung at TestSyncError 3/3 runs with this patch applied, and completed > successfully 3/3 times without it. > > This patch also really needs tests. ipc/ipdl/test/README.txt briefly > explains how the two testsuites (compiler-only and C++) work. You can run > all the tests with > > make -C $objdir/ipc/ipdl/test check > > The test suites aren't rocket science, should be able to cargo-cult your way > in. Feel free to ping me if you get stuck. > > r-'ing based on that, but will have comments about the substantial content > of the patch in a bit. Okay, I fixed that test, thanks. I'll have tests for the next patch.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 13

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #9) > >@@ -188,45 +193,58 @@ RPCChannel::Call(Message* _msg, Message* > > >+ else if (!mUrgent.empty()) { > >+ recvd = mUrgent.front(); > >+ mUrgent.pop_front(); > >+ } > > I'm not sure it matters, but we need to think about whether we should > process out-of-turn replies before urgent messages. My intuition is > that it doesn't matter, but I'm not sure. Right now we do process out-of-turn replies first. I don't think it matters, at least not yet. > >@@ -373,21 +390,26 @@ RPCChannel::OnMaybeDequeueOne() > > >+ MessageQueue *queue = mUrgent.empty() > >+ ? mPending.empty() > >+ ? &mNonUrgentDeferred > >+ : &mPending > >+ : &mUrgent; > > Shouldn't we flush the non-urgent deferred queue before the pending > queue? Actually, it's not clear to me that those queues won't > overlap, by which I mean there could be both older and newer messages > in mPending than in mNonUrgentDeferred. We need to think about so as > not to trash in-order delivery :). Hrm, good point. I'm still thinking about this. > The original goal was to "layer" messaging semantics on each other, > async < sync < rpc, but this code has gotten cross-invasive enough > that that's turned out to be a bad idea. Sorry about that. Yeah, I agree, Sync/RPC and maybe even the Async code should be all merged together someday. I might attempt that if nesting urgent messages gets too complicated.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 14

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #11) > > while (1) { > >- bool maybeTimedOut = !SyncChannel::WaitForNotify(); > >+ // if a handler invoked by *Dispatch*() spun a nested event > >+ // loop, and the connection was broken during that loop, we > >+ // might have already processed the OnError event. if so, > >+ // trying another loop iteration will be futile because > >+ // channel state will have been cleared > >+ if (!Connected()) { > > Please write a test for this case if you can. This is pre-existing code, it just got indented. > I can't say I like the SyncChannel changes, since we're importing more > logic from RPCChannel, but I think I understand them and I think they > look OK. But let's see if we can get away with RPC race resolution. We were encountering ordering problems with async messages coming from Chrome. If I'm understanding it correctly - the RPC race resolution would only work if we changed those messages to be rpc?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 15

•

12 years ago

(In reply to David Anderson [:dvander] from comment #14) > (In reply to Chris Jones [:cjones] [:warhammer] from comment #11) > > > I can't say I like the SyncChannel changes, since we're importing more > > logic from RPCChannel, but I think I understand them and I think they > > look OK. But let's see if we can get away with RPC race resolution. > > We were encountering ordering problems with async messages coming from > Chrome. If I'm understanding it correctly - the RPC race resolution would > only work if we changed those messages to be rpc? Correct, rpc doesn't "race" with async (or with sync), semantically. OK then.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

12 years ago

Depends on: 876989

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 16

•

12 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

Addresses review comments, and adds some tests for urgent messages. re: mPending versus mNonUrgentDeferred and order delivery problems. You're right I flipped the order the queues are resolved in, thanks. Whether the queues can still be out-of-order... RPCChannel::OnMessageReceivedFromLink will fill mPending with any message that's not a reply. RPCChannel::Call()'s wait loop will then pop mPending, and either immediately process it or move it into mNonUrgentDeferred. The question is whether we can add to mPending, without moving to mNonUrgentDeferred, or without processing mNonUrgentDeferred first. We don't allow anything to nest on top of urgent messages, so we shouldn't be spinning up nested event loops or anything. Even once we do nest urgent messages (on top of urgent messages only), as long as no rpc/sync stuff can happen on top, it should be okay. I don't know enough to say for sure so I'll try asserting to that affect.

Attachment #743382 - Attachment is obsolete: true

Attachment #755133 - Flags: review?(cjones.bugs)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 17

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #9) > >diff --git a/ipc/glue/RPCChannel.h b/ipc/glue/RPCChannel.h > > >+ // List of async and sync messages that have been received while waiting > >+ // for an urgent reply, that need to be deferred until that reply has been > >+ // received. > >+ MessageQueue mNonUrgentDeferred; > > These can be rpc as well, right? They cannot.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 18

•

12 years ago

Comment on attachment 755133 [details] [diff] [review] v2 >diff --git a/ipc/glue/RPCChannel.cpp b/ipc/glue/RPCChannel.cpp > bool > RPCChannel::Call(Message* _msg, Message* reply) > { >+ MOZ_ASSERT(!mPendingReply); Nit-ish: Please use |RPC_ASSERT(!mPendingReply)|, which also logs info about channel state like pseudo-stack. >@@ -161,45 +166,58 @@ RPCChannel::Call(Message* _msg, Message* >+ if (urgent && recvd.priority() != IPC::Message::PRIORITY_HIGH) { >+ // If we're waiting for an urgent reply, don't process any >+ // messages yet. >+ mNonUrgentDeferred.push_back(recvd); >+ } else { >+ MonitorAutoUnlock unlock(*mMonitor); >+ CxxStackFrame f(*this, IN_MESSAGE, &recvd); >+ AsyncChannel::OnDispatchMessage(recvd); >+ } > continue; > } > > if (recvd.is_sync()) { >+ if (urgent && recvd.priority() != IPC::Message::PRIORITY_HIGH) { >+ // If we're waiting for an urgent reply, don't process any >+ // messages yet. >+ mNonUrgentDeferred.push_back(recvd); >+ } else { >+ RPC_ASSERT(mPending.empty(), >+ "other side should have been blocked"); >+ MonitorAutoUnlock unlock(*mMonitor); >+ CxxStackFrame f(*this, IN_MESSAGE, &recvd); >+ SyncChannel::OnDispatchMessage(recvd); >+ } Nit-ish: I'd prefer to avoid the code duplication here with if (!recvd.is_rpc()) { if (urgent && ...) { ... } else if(!recvd.is_sync()) { ... } else { ... } continue; } >- if (0 == StackDepth()) { >- // the worker thread might be idle, make sure it wakes up >+ // There are three cases we're concerned about, relating to the state of >+ // the main thread: >+ // >+ // (1) We are waiting on a sync reply - main thread is blocked on the IPC monitor. >+ // - If the message is high priority, we wake up the main thread to >+ // deliver the message. Otherwise, we leave it in the mPending queue, >+ // posting a task to the main event loop, where it will be processed >+ // once the synchronous reply has been received. >+ // >+ // (2) We are waiting on an RPC reply - main thread is blocked on the IPC monitor. >+ // - Always wake up the main thread to deliver the message. >+ // >+ // (3) We are not waiting on a reply. >+ // - We post a task to the main event loop. >+ // >+ bool waiting_rpc = (0 != StackDepth()); >+ bool urgent = (msg.priority() == IPC::Message::PRIORITY_HIGH); >+ if (!waiting_rpc && !(AwaitingSyncReply() && urgent)) { A subtlety here that I had to reassure myself of is an outgoing sync message racing with incoming RPC. In that case, !waiting_rpc = true (AwaitingSyncReply() && urgent) = false so we would end up enqueuing the RPC for the main event-loop and not wake up the SyncChannel waiter. That might seem wrong, but it's actually the correct way to resolve that semantic race. With that in mind, it might be clearer to flip the bits we check here if (waiting_rpc || (AwaitingSyncReply() && urgent)) { // Always wake up our rpc waiter, and wake up sync waiters for // urgent messages. Notify(); } else { // Worker thread is either not blocked on a reply, or this is an // incoming RPC that raced with outgoing sync and needs to be // deferred to a later event-loop iteration. //... } >diff --git a/ipc/ipdl/test/cxx/Makefile.in b/ipc/ipdl/test/cxx/Makefile.in > TestBadActor \ >+ TestUrgency \ Nit: tab. > $(NULL) > > ifeq ($(OS_ARCH),Linux) > IPDLTESTS += TestSysVShmem > endif > > EXTRA_PROTOCOLS = \ > TestBridgeSub \ >diff --git a/ipc/ipdl/test/cxx/PTestUrgency.ipdl b/ipc/ipdl/test/cxx/PTestUrgency.ipdl >+enum { >+ kFirstTestBegin = 1, >+ kFirstTestGotReply, >+ kSecondTestBegin, >+ kSecondTestGotReply, >+}; Would be nice to write a protocol state machine for this test and replace this with asserts over actor |state()|, but as state machines are currently spec'd, that gets pretty hard with message races (bug 547703 and followups). OK, this is looking good. These are mostly just stylistic changes, but I'd like to take one more pass before r+. Still afraid of missing something :).

Attachment #755133 - Flags: review?(cjones.bugs)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 19

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #18) > With that in mind, it might be clearer to flip the bits we check here > > if (waiting_rpc || (AwaitingSyncReply() && urgent)) { > // Always wake up our rpc waiter, and wake up sync waiters for > // urgent messages. > Notify(); > } else { > // Worker thread is either not blocked on a reply, or this is an > // incoming RPC that raced with outgoing sync and needs to be > // deferred to a later event-loop iteration. > //... > } I meant to add, but please go with whatever is clearer to you.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 20

•

12 years ago

Attached patch v3 — Details — Splinter Review

Nits applied - I agree the inverted logic is easier to read. Getting those conditions right was the most difficult part of this, TBH. I didn't have a chance to do the protocol state machine yet, but wanted to make sure the rest of the patch was in a landable state since I'm on PTO this week.

Attachment #755133 - Attachment is obsolete: true

Attachment #760294 - Flags: review?(cjones.bugs)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 21

•

12 years ago

(In reply to David Anderson [:dvander] from comment #20) > Created attachment 760294 [details] [diff] [review] > v3 > > Nits applied - I agree the inverted logic is easier to read. Getting those > conditions right was the most difficult part of this, TBH. Yeah, it was also the hardest to review :). > I didn't have a chance to do the protocol state machine yet, I would skip that. I think all you would gain from that is learning what the limitations in the current system are ;). > but wanted to > make sure the rest of the patch was in a landable state since I'm on PTO > this week. D'oh, I started traveling the day you sent this :(. Sorry for lag.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

12 years ago

Attachment #760294 - Flags: review?(cjones.bugs) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 22

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e10afc0c67e0 try run: https://tbpl.mozilla.org/?tree=Try&rev=17d1fbf23e0a

(PTO June 19-July 7) Ryan VanderMeulen

Comment 23

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/e10afc0c67e0

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla25

Bobby Holley (:bholley)

Comment 24

•

12 years ago

I ended up having to backport this: https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?changeset=3266c1d73816

status-firefox24: --- → fixed

v1 12 years ago David Anderson [:dvander] - inactive, e-mail if emergency 26.87 KB, patch	cjones : review-	Details \| Diff \| Splinter Review
v2 12 years ago David Anderson [:dvander] - inactive, e-mail if emergency 34.74 KB, patch		Details \| Diff \| Splinter Review
v3 12 years ago David Anderson [:dvander] - inactive, e-mail if emergency 34.84 KB, patch	cjones : review+	Details \| Diff \| Splinter Review