Introduce new RPC messaging semantics

RESOLVED FIXED in mozilla27

Status

()

Core
IPC
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: dvander, Assigned: dvander)

Tracking

({dev-doc-needed})

unspecified
mozilla27
dev-doc-needed
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 5 obsolete attachments)

(Assignee)

Description

5 years ago
Once RPC has been renamed to Interrupt, we can steal the name RPC for a new protocol. The semantics will be identical to "sync", except that when it wakes up to process an urgent request, a new rpc call can be initiated while the old one is still waiting on a reply. Like sync, they are strictly limited to initiating from the child process, and like sync, they can wake up for an urgent message.

Urgent message semantics will change as well, in that the parent process can initiate a new urgent request while replying to an rpc call. Additionally, if the parent process is blocked waiting for an urgent reply, it will be able to wake up to handle an incoming rpc call.
(Assignee)

Updated

5 years ago
Assignee: nobody → dvander
Status: NEW → ASSIGNED
(Assignee)

Comment 1

5 years ago
Created attachment 796992 [details] [diff] [review]
wip v1

Seems to work with the small test case, will need to add the nsFrameMessageManager and addon support to test fully though.
(Assignee)

Comment 2

4 years ago
Created attachment 798079 [details] [diff] [review]
part 1, wip v2 - rpc protocol
Attachment #796992 - Attachment is obsolete: true
(Assignee)

Comment 3

4 years ago
Created attachment 798083 [details] [diff] [review]
part 2, sendRpcMessage for message manager
(Assignee)

Comment 4

4 years ago
Created attachment 800308 [details] [diff] [review]
part 1, wip v3 - rpc semantics

This version fixes a potential race scenario, where if the child and parent send each other urgent and rpc messages at the same time, the parent must not wake up inside its urgent message to process the preceding rpc message. The new RPC protocol does not tolerate out-of-order replies to urgent messages, so we must guarantee that any interrupting messages are in response to a CPOW and have blocked the child.
Attachment #798079 - Attachment is obsolete: true
(Assignee)

Comment 5

4 years ago
Comment on attachment 800308 [details] [diff] [review]
part 1, wip v3 - rpc semantics

I left a big comment in MessageChannel.h explaining how the rpc/urgent race resolution works - I don't know whether this is the best approach though (I was trying to avoid the whole out-of-order reply complexity that the plugin message semantics use).
Attachment #800308 - Flags: review?(cjones.bugs)
(Assignee)

Comment 6

4 years ago
Created attachment 800384 [details] [diff] [review]
part 2: messagemanager changes

Adds sendRpcMessage to nsFrameMessageManager, and change the e10s addon compatibility code to use this call instead of sendSyncMessage (which fixes the two crashing addons).
Attachment #798083 - Attachment is obsolete: true
Attachment #800384 - Flags: review?(bugs)

Comment 7

4 years ago
Comment on attachment 800384 [details] [diff] [review]
part 2: messagemanager changes

I assume an updated patch to handle nested sendSyncMessage is coming.
(throw an exception).

Though I still wonder if sendSyncMessage should just use rpc.
But we can start with separate rpc message approach.
Attachment #800384 - Flags: review?(bugs)
(Assignee)

Comment 8

4 years ago
Created attachment 800508 [details] [diff] [review]
part 2, sendRpcMessage for message manager

Throws an exception inside nested sendSyncMessage, so we don't kill the child (also added a test case for this).
Attachment #800384 - Attachment is obsolete: true
Attachment #800508 - Flags: review?(bugs)
/me puts on pedant cap temporarily ;)
Summary: Introduce new RPC protocol → Introduce new RPC messaging semantics
(Assignee)

Comment 10

4 years ago
Comment on attachment 800308 [details] [diff] [review]
part 1, wip v3 - rpc semantics

;)
Attachment #800308 - Attachment description: part 1, wip v3 - rpc protocol → part 1, wip v3 - rpc semantics
Comment on attachment 800308 [details] [diff] [review]
part 1, wip v3 - rpc semantics

I'm not done with the review yet, but so far things are looking about how I expected.  Have to run out for a bit again :/.

One quick question: did you decide against allocating "transaction IDs" from the seqno pool?  I don't have a strong opinion on that yet, but reusing seqno's may simplify things a bit.
(Assignee)

Comment 12

4 years ago
(In reply to Chris Jones [:cjones] [:warhammer] from comment #11)
> One quick question: did you decide against allocating "transaction IDs" from
> the seqno pool?  I don't have a strong opinion on that yet, but reusing
> seqno's may simplify things a bit.

At the time, I wasn't sure whether it was safe, but thinking about it now it seems like it should work. It'd still need to be a separate field on the message but we could get rid of the second sequence generator.
(In reply to David Anderson [:dvander] from comment #12)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #11)
> > One quick question: did you decide against allocating "transaction IDs" from
> > the seqno pool?  I don't have a strong opinion on that yet, but reusing
> > seqno's may simplify things a bit.
> 
> At the time, I wasn't sure whether it was safe, but thinking about it now it
> seems like it should work.

Yeah it would work.

> It'd still need to be a separate field on the
> message but we could get rid of the second sequence generator.

Yep.
Comment on attachment 800308 [details] [diff] [review]
part 1, wip v3 - rpc semantics

>diff --git a/ipc/glue/MessageChannel.cpp b/ipc/glue/MessageChannel.cpp

>@@ -286,43 +290,61 @@ MessageChannel::OnMessageReceivedFromLin

>+    bool shouldWakeUp = AwaitingInterruptReply() ||
>+                        // Allow incoming events to be processed inside a CPOW.

(Following note in MessageChannel.h, let's drop the CPOW references in
this file.)

>+    // If we're receiving an RPC message while blocked on an urgent message,
>+    // we must defer any messages that are not part of the current transaction.
>+    if (shouldWakeUp && (AwaitingUrgentReply() && aMsg.is_rpc())) {
>+        if (aMsg.transaction_id() != mCurrentRPCTransaction)
>+            shouldWakeUp = false;
>+    }

IIUC, this is the core of the race resolution algorithm for
urgent/rpc.  It might be worth going into a bit of detail about that.
In particular, I'm concerned about how this operates in conjunction
with the comment about atomicity of |cond.Wait()| in
AutoEnterRPCTransaction below.  If we miss wakeups, we're hosed :), so
it's worth thinking through a proof-y style argument.

>+    if (shouldWakeUp) {
>         // Always wake up our Interrupt waiter, and wake up sync waiters for urgent
>         // messages.

And rpc waiters for urgent messages, and urgent waiters for rpc's in
the same txn.

>@@ -379,31 +395,62 @@ MessageChannel::UrgentCall(Message* aMsg

> bool
>+MessageChannel::RPCCall(Message* aMsg, Message* aReply)
>+{

>+    // RPC calls must be the only thing on the stack.
>+    IPC_ASSERT(!AwaitingInterruptReply(), "rpc calls cannot be issued within interrupts");
>+    IPC_ASSERT(!AwaitingSyncReply(), "rpc calls cannot be issued within sync sends");

This would make it illegal to rpc in response to urgent messages
received while blocked on an intr or sync outcall, right?  I don't
recall us discussing that in great detail, but ISTR that we needed to
support that.

> bool
>+MessageChannel::ProcessPendingUrgentRequest()
>+{
>+    AssertWorkerThread();
>+    mMonitor->AssertCurrentThreadOwns();
>+
>+    // Note that it is possible we could have sent a sync message at
>+    // the same time the parent process sent an urgent message, and
>+    // therefore mPendingUrgentRequest is set *and* mRecvd is set as
>+    // well, because the link thread received both before the worker
>+    // thread woke up.
>+    //
>+    // In this case, we process the urgent message first, but we need
>+    // to save the reply. Note that getting 

...?

>+    // At this point, mRecvd should still be empty, since any replies
>+    // should have been handled by an inner out-call.
>+    IPC_ASSERT(!mRecvd, "unknown reply");

I'm not sure I'm convinced by this.  We drop the monitor for
DispatchUrgentMessage(), so isn't it possible for |mRecvd| to be NULL
when we enter this function; then a reply to arrive in the narrow
window after we finish Dispatch*(), but before we re-acquire the
monitor?  I may be missing something.

Same for below.

>diff --git a/ipc/glue/MessageChannel.h b/ipc/glue/MessageChannel.h

>@@ -495,19 +502,76 @@ class MessageChannel : HasResultCodes

>+    // Worker-thread only; Number of urgent and rpc replies we're waiting on.
>+    // These are mutually exclusive since one channel cannot have outcalls of
>+    // both kinds.
>     size_t mPendingUrgentReplies;
>+    size_t mPendingRPCReplies;

Combine into one counter then?  Not a big deal either way.

>+
>+    // When we send an urgent request from the parent process, we could race
>+    // with an RPC message that was issued by the child beforehand. In this
>+    // case, if the parent were to wake up while waiting for a CPOW reply,
>+    // and process the RPC, it could send an additional CPOW, and receive
>+    // replies out of order.

There's one bit of information missing from this: the reason that
scenario is possible is because the child will always wake up to
process the urgent message.  (Not terribly unobvious, but I think
repeating that makes this easier to understand.)

Let's leave mention of "CPOW" out of this header, since that's a much
higher-level concept.  It's a little confusing in this context.

>+    // To address this problem, urgent or RPC requests are associated with a
>+    // "transaction". Whenever one side of the channel wishes to start a
>+    // chain of RPC/urgent messages, it allocates a new transaction ID. Any
>+    // messages the parent receives, not apart of this transaction, are
>+    // deferred. When issuing RPC/urgent requests on top of a started
>+    // transaction, the initiating transaction ID is used.
>+    // 
>+    // On the child side, transaction IDs grow down; on the parent side,
>+    // they grow up.

(An intro like, "To ensure IDs are unique, ..." intro would help this
a bit.)

Also, this is the same scheme we use for sequence numbers.  Is there
any reason we can't allocate transaction IDs from the sequence-number
pool?  I'm not sure if it would make this much simpler or not, but it
might be nice to consolidate that code.

>+    class AutoEnterRPCTransaction
>+    {
>+      public:
>+       AutoEnterRPCTransaction(MessageChannel *aChan)
>+        : mChan(aChan),
>+          mOldTransaction(mChan->mCurrentRPCTransaction)
>+       {
>+           if (mChan->mCurrentRPCTransaction == 0)
>+               mChan->mCurrentRPCTransaction = mChan->NextRPCTransaction();
>+       }
>+       AutoEnterRPCTransaction(MessageChannel *aChan, int32_t received)
>+        : mChan(aChan),
>+          mOldTransaction(mChan->mCurrentRPCTransaction)
>+       {
>+           MOZ_ASSERT_IF(mChan->mSide == ParentSide,
>+                         !mOldTransaction || mOldTransaction == received);
>+
>+           MonitorAutoLock lock(*mChan->mMonitor);

This looks a bit suspicious to me.  It looks like it might break the
invariant that awaiting-message state seen by the IO thread is updated
atomically with the tracee going into |cond.Wait()|, if necessary.  

>@@ -543,17 +607,23 @@ class MessageChannel : HasResultCodes

>+    // Note that these two pointers are mutually exclusive. One channel cannot
>+    // send both urgent requests (parent -> child) and RPC calls (child->parent).
>+    // Also note that since initiating either requires blocking, they cannot
>+    // queue up on the other side. One message slot is enough.

And because they're processed in a different "mode" from the non-rpc
messages, which are queued separately if they're received while in the
rpc "mode".

>diff --git a/ipc/ipdl/ipdl/ast.py b/ipc/ipdl/ipdl/ast.py

>+class RPC:
>+    pretty = 'rpc'
>+    @classmethod
>+    def __hash__(cls): return hash(cls.pretty)
>+    @classmethod
>+    def __str__(cls):  return cls.pretty

The code duplication here is my fault, but I might be inclined to an
r++ if you tidied up a bit while you're in here ;).  Your choice
though.

>diff --git a/ipc/ipdl/ipdl/lower.py b/ipc/ipdl/ipdl/lower.py
 
> def _sendPrefix(msgtype):
>     """Prefix of the name of the C++ method that sends |msgtype|."""
>-    if msgtype.isInterrupt() or msgtype.isUrgent():
>+    if msgtype.isInterrupt() or msgtype.isUrgent() or msgtype.isRpc():

(There are more messaging semantics that use "Send" now than use
"Call" ;), but I don't particularly care about this.)

>diff --git a/ipc/ipdl/ipdl/type.py b/ipc/ipdl/ipdl/type.py

>-    def hasReply(self):  return self.isSync() or self.isInterrupt() or self.isUrgent()
>+    def hasReply(self):  return (self.isSync() or
>+                                 self.isInterrupt() or
>+                                 self.isUrgent() or
>+                                 self.isRpc())

Nit: (foo()
      or bar()
      or baz())

> 
>     def needsMoreJuiceThan(self, o):
>         return (o.isAsync() and not self.isAsync()
>-                or o.isSync() and self.isInterrupt())
>+                or o.isSync() and (self.isUrgent() or self.isRpc())
>+                or (o.isUrgent() or o.isRpc()) and self.isInterrupt())

Need a compiler error test for this.

>@@ -1462,16 +1468,22 @@ class CheckTypes(TcheckVisitor):

>+        if mtype.isRpc() and (mtype.isOut() or mtype.isInout()):
>+            self.error(
>+                loc,
>+                "rpc parent-to-child messages are verboten (here, message' `%s' in protocol `%s')",
>+                mname, pname)
>+

Need a compiler error test for this.

I've hated the IPDL compiler code for a while, but I'm somewhat
surprised your patch was this small.

>diff --git a/ipc/ipdl/test/cxx/Makefile.in b/ipc/ipdl/test/cxx/Makefile.in

>   TestInterruptShutdownRace \
>+	TestRPC \

Nit: tab character.

I didn't look over the tests in review v1 because we have some
higher-level issues to remind me of / sort out.

On the whole this looks very very good.  No qualms r+'ing so far once
we get the edge cases worked out.
Attachment #800308 - Flags: review?(cjones.bugs)
(Assignee)

Comment 15

4 years ago
(In reply to Chris Jones [:cjones] [:warhammer] from comment #14)
> Comment on attachment 800308 [details] [diff] [review]
> part 1, wip v3 - rpc semantics
> 
> >diff --git a/ipc/glue/MessageChannel.cpp b/ipc/glue/MessageChannel.cpp
> 
> >@@ -379,31 +395,62 @@ MessageChannel::UrgentCall(Message* aMsg
> 
> > bool
> >+MessageChannel::RPCCall(Message* aMsg, Message* aReply)
> >+{
> 
> >+    // RPC calls must be the only thing on the stack.
> >+    IPC_ASSERT(!AwaitingInterruptReply(), "rpc calls cannot be issued within interrupts");
> >+    IPC_ASSERT(!AwaitingSyncReply(), "rpc calls cannot be issued within sync sends");
> 
> This would make it illegal to rpc in response to urgent messages
> received while blocked on an intr or sync outcall, right?  I don't
> recall us discussing that in great detail, but ISTR that we needed to
> support that.

Hrm, right. I'll add a test case for that, we should support it.

> >+    // In this case, we process the urgent message first, but we need
> >+    // to save the reply. Note that getting 
> 

I forgot what I was going to say :) deleted.

> 
> >+    // At this point, mRecvd should still be empty, since any replies
> >+    // should have been handled by an inner out-call.
> >+    IPC_ASSERT(!mRecvd, "unknown reply");
> 
> I'm not sure I'm convinced by this.  We drop the monitor for
> DispatchUrgentMessage(), so isn't it possible for |mRecvd| to be NULL
> when we enter this function; then a reply to arrive in the narrow
> window after we finish Dispatch*(), but before we re-acquire the
> monitor?  I may be missing something.
> 
> Same for below.

Okay, right, I see what you're saying. It's not before dispatch but after, once the parent has unblocked from the urgent call. The assert is indeed bogus, it should be (!mRecvd || !savedReply), and mRecvd has to be set appropriately.

> >+    // Worker-thread only; Number of urgent and rpc replies we're waiting on.
> >+    // These are mutually exclusive since one channel cannot have outcalls of
> >+    // both kinds.
> >     size_t mPendingUrgentReplies;
> >+    size_t mPendingRPCReplies;
> 
> Combine into one counter then?  Not a big deal either way.

I'd like to keep them separate just for clarity, also Awaiting*() use them.

> >diff --git a/ipc/ipdl/ipdl/ast.py b/ipc/ipdl/ipdl/ast.py
> 
> >+class RPC:
> >+    pretty = 'rpc'
> >+    @classmethod
> >+    def __hash__(cls): return hash(cls.pretty)
> >+    @classmethod
> >+    def __str__(cls):  return cls.pretty
> 
> The code duplication here is my fault, but I might be inclined to an
> r++ if you tidied up a bit while you're in here ;).  Your choice
> though.

Is it okay to store the pretty name as a self variable instead, and derive all the singletons from a base class?
(In reply to David Anderson [:dvander] from comment #15)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #14)
> > The code duplication here is my fault, but I might be inclined to an
> > r++ if you tidied up a bit while you're in here ;).  Your choice
> > though.
> 
> Is it okay to store the pretty name as a self variable instead, and derive
> all the singletons from a base class?

Yeah, that's what I had in mind.
Comment on attachment 800508 [details] [diff] [review]
part 2, sendRpcMessage for message manager

I do expect that we'll need to convert sendSync to do internally Rpc (and that may lead to new problems) at some point.

I'm somewhat surprised if spinning the event loop while calling
sendSyncMessage doesn't cause us major problems - the same way as what
Sync XHR and modal dialogs cause in current Gecko.



>+nsresult
>+nsFrameMessageManager::SendMessage(const nsAString& aMessageName,
>+                                   const JS::Value& aJSON,
>+                                   const JS::Value& aObjects,
>+                                   JSContext* aCx,
>+                                   uint8_t aArgc,
>+                                   JS::Value* aRetval,
>+                                   bool aIsSync)
>+{
>   NS_ASSERTION(!IsGlobal(), "Should not call SendSyncMessage in chrome");
>   NS_ASSERTION(!IsWindowLevel(), "Should not call SendSyncMessage in chrome");
>   NS_ASSERTION(!mParentManager, "Should not have parent manager in content!");
> 
>   *aRetval = JSVAL_VOID;
>   NS_ENSURE_TRUE(mCallback, NS_ERROR_NOT_INITIALIZED);
> 
>+  if (sSendingSyncMessage) {
>+    // No kind of blocking send should be issued on top of a sync message.
>+    return NS_ERROR_UNEXPECTED;
>+  }
oh this way. I thought you wanted to let rpc messages go through. If so
aIsSync && sSendingSyncMessage ....

>+  sSendingSyncMessage = aIsSync;
>+  bool rv = mCallback->DoSendBlockingMessage(aCx, aMessageName, data, objects, &retval, aIsSync);
>+  sSendingSyncMessage = false;
...and if so, this doesn't quite work. If there is sync->rpc->sync calls, the latter sync gets through.
Only the caller who sets sSendingSyncMessage to true should unset it.

>+    if (aIsSync)
>+      return cc->SendSyncMessage(nsString(aMessage), data, cpows, aJSONRetVal);
Always use {} with if. Same also elsewhere



Those answered, r=me. 
Though, some more tests for nested sync and rpc calls would be good.
Attachment #800508 - Flags: review?(bugs) → review+
(Assignee)

Comment 18

4 years ago
Created attachment 804780 [details] [diff] [review]
part 1, v4

To reduce confusion around AutoEnterRPCTransaction, I put it closer to the Dispatch() calls, where we're guaranteed to have the monitor lock already. The only outlier is OnMaybeDequeueOne(), but it's safe here since there will never be an existing transaction transaction, and we're not blocked, so the link thread will always post to the event loop.
Attachment #800308 - Attachment is obsolete: true
Attachment #804780 - Flags: review?(cjones.bugs)
Comment on attachment 804780 [details] [diff] [review]
part 1, v4

>diff --git a/ipc/ipdl/ipdl/ast.py b/ipc/ipdl/ipdl/ast.py

<3

>diff --git a/ipc/ipdl/test/cxx/PTestRPC.ipdl b/ipc/ipdl/test/cxx/PTestRPC.ipdl

>+    urgent Test2_FirstCpow();

(I'd prefer to leave CPOWs out of here too.)

This looks great! :)  Very nice. r=me.
Attachment #804780 - Flags: review?(cjones.bugs) → review+
(Assignee)

Comment 20

4 years ago
(In reply to Olli Pettay [:smaug] from comment #17)
> >+  if (sSendingSyncMessage) {
> >+    // No kind of blocking send should be issued on top of a sync message.
> >+    return NS_ERROR_UNEXPECTED;
> >+  }
> oh this way. I thought you wanted to let rpc messages go through. If so
> aIsSync && sSendingSyncMessage ....

Good catch, thanks, that was indeed unintended.
https://hg.mozilla.org/mozilla-central/rev/9cfe826bafb9
https://hg.mozilla.org/mozilla-central/rev/19045a3843db
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27

Updated

4 years ago
Keywords: dev-doc-needed
You need to log in before you can comment on or make changes to this bug.