Closed Bug 1223240 Opened 9 years ago Closed 8 years ago

Make it easier to set up top-level protocols

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox45 --- affected
firefox46 --- fixed

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(2 files, 1 obsolete file)

Currently if you want to start a new top-level protocol PFoo, you need to use either PFoo::Open or PFoo::Bridge. The best documentation on these is probably in bugs 564086 and 613442 (comment 0 of both). They're somewhat awkward to use:

- Each protocol that participates in "Open" or "Bridge" needs to declare so in IPDL, and the syntax isn't very obvious.
- It's annoying that there are two separate mechanisms for creating top-level protocols. The only difference is whether you're going to keep one side of the channel for yourself (Open) or send both sides to other processes (Bridge).
- If you use Open, so that you are going to keep one side to yourself, you don't get immediate access to the opened actor. Instead you have to wait for an asynchronous callback. To cope with this, we have had to introduce nested event loops to wait for the new actor (see bugs 1109338 and 1057908).

I'd like to make it easier to create top-level actors. Here's how things work with the new patch:

Endpoint<PFooParent> parentEp;
Endpoint<PFooChild> childEp;
nsresult rv = PFoo::CreateEndpoints(parentPid, childPid, &parentEp, &childEp);

Now you can send parentEp and childEp to whichever thread or process you want. Endpoints can be sent over IPDL or to other threads using PostTask. Once an Endpoint has arrived at its destination process/thread, you need to create the top-level actor and bind it to the endpoint:

FooParent* parent = new FooParent();
bool rv1 = parentEp.Bind(parent, nullptr);
bool rv2 = parent->SendBar(...);

Once the actor is bound to the endpoint, it can send and receive messages. If you plan to keep one endpoint yourself, then you can Bind immediately and there's no need for a nested event loop.

I think this is a much easier concept to understand. One downside is that we're currently doing a pretty weak static analysis on the actor relationships to prove that you can't end up with two threads synchronously waiting on each other via different protocols. Protocols created using endpoints won't be checked with this analysis. I don't think the analysis is really buying us much since you can already get into lots of deadlock trouble in other ways (intr messages or high priority messages). I have some ideas on how to improve this sort of checking in other ways; I'd rather not have to deal with it in this bug though.
Attached patch patch (obsolete) — Splinter Review
Jed, can you review most of this patch? Bob, I think you just need to look over the Transport_win.cpp changes. The callers for CloseDescriptor and DuplicateDescritor are in Endpoint, so you might want to look at that too. I think I found a problem in CloseDescriptor, so I made changes. I hope I understood how it's supposed to work.

The new tests here are basically copies of the tests for ::Open and ::Bridge, but now they use endpoints. I also fixed a bug in the Open test where it was posting a runnable to the wrong thread.

In case it helps, here's what the generated code looks like for PTestEndpointOpens::CreateEndpoints:

nsresult
CreateEndpoints(
        base::ProcessId aParentDestPid,
        base::ProcessId aChildDestPid,
        mozilla::ipc::Endpoint<mozilla::_ipdltest::PTestEndpointOpensParent>* aParent,
        mozilla::ipc::Endpoint<mozilla::_ipdltest::PTestEndpointOpensChild>* aChild)
{
    return mozilla::ipc::CreateEndpoints(mozilla::ipc::PrivateIPDLInterface(), aParentDestPid, aChildDestPid, PTestEndpointOpensMsgStart, PTestEndpointOpensMsgStartChild, aParent, aChild);
}

Endpoints are always passed using move references since they don't have a copy constructor. That makes it easier to keep track of the lifetimes of the transports they own. When you declare a Recv function that takes an Endpoint, the Endpoint will be passed by move reference.
Attachment #8685205 - Flags: review?(jld)
Attachment #8685205 - Flags: review?(bobowen.code)
Comment on attachment 8685205 [details] [diff] [review]
patch

Review of attachment 8685205 [details] [diff] [review]:
-----------------------------------------------------------------

I still need to look at this some more, especially the tests, but I have some comments.

::: ipc/glue/ProtocolUtils.h
@@ +556,5 @@
> +    static bool Read(const Message* aMsg, void** aIter, paramType* aResult)
> +    {
> +        MOZ_RELEASE_ASSERT(!aResult->mValid);
> +        aResult->mValid = true;
> +        if (!IPC::ReadParam(aMsg, aIter, reinterpret_cast<uint32_t*>(&aResult->mMode)) ||

Could you use static_assert to ensure that these casts will do the right thing?  Or read into temporaries and static_cast the values?  Or use a ContiguousEnumSerializer instead?

::: ipc/glue/Transport_posix.cpp
@@ +65,5 @@
> +DuplicateDescriptor(const TransportDescriptor& aTd)
> +{
> +  TransportDescriptor result = aTd;
> +  result.mFd.fd = dup(aTd.mFd.fd);
> +  MOZ_RELEASE_ASSERT(result.mFd.fd != -1, "DuplicateDescriptor failed");

This is effectively an infallible allocation of a file descriptor; is that what we usually do?

::: ipc/glue/Transport_win.cpp
@@ +74,5 @@
> +  if (mServerPipe != INVALID_HANDLE_VALUE) {
> +    HANDLE serverDup;
> +    DWORD access = 0;
> +    DWORD options = DUPLICATE_SAME_ACCESS;
> +    MOZ_RELEASE_ASSERT(DuplicateHandle(aTd.mServerPipe, aTd.mPipeProcessId, &serverDup, access, options));

See below about the hSourceProcessHandle argument to the underlying ::DuplicateHandle() call.

@@ +89,5 @@
>  CloseDescriptor(const TransportDescriptor& aTd)
>  {
> +  // If mServerPipe is valid, then it's a handle to a pipe in another
> +  // process. We can close it using some weird options for
> +  // DuplicateHandle. (CloseHandle won't work.)

Nit: I'd call it “a handle owned by another process” or something like that, to make it clear that it's the handle itself that doesn't belong to the caller (which is why it's not valid for a call to CloseHandle), rather than the pipe object it references.

@@ +91,5 @@
> +  // If mServerPipe is valid, then it's a handle to a pipe in another
> +  // process. We can close it using some weird options for
> +  // DuplicateHandle. (CloseHandle won't work.)
> +  if (aTd.mServerPipe != INVALID_HANDLE_VALUE) {
> +    DuplicateHandle(aTd.mServerPipe, aTd.mPipeProcessId, nullptr, 0, DUPLICATE_CLOSE_SOURCE);

I think this won't work.  This calls the DuplicateHandle wrapper in ProtocolUtils.cpp, which assumes the handle to duplicate is valid in the calling process and passes GetCurrentProcess() as the hSourceProcessHandle argument.

But this handle is valid only in another process, so hSourceProcessHandle has to be a handle to that process (this is also documented in the MSDN page for DuplicateHandle where it explains the remote-closing trick, but there are enough handles flying around that it's easy to misread).

So if I understand correctly, this would need to do something like (partially borrowed from that wrapper):

ScopedProcessHandle pipeProcess;
if (!base::OpenProcessHandle(aTd.mPipeProcessId, &pipeProcess.rwget())) {
  return;
}
::DuplicateHandle(pipeProcess, aTd.mServerPipe, nullptr, nullptr, 0, FALSE, DUPLICATE_CLOSE_SOURCE);

Using nullptr/NULL/0 for the target process handle: MSDN doesn't say what to do with that, so I'm following an example I found in Breakpad.  I'm guessing that if that were a valid process handle (including INVALID_HANDLE_VALUE, which in this context *is* valid and indicates the current process, sigh), the duplication would succeed and the new handle would leak, and the goal is for the duplication to fail, because DUPLICATE_CLOSE_SOURCE closes the source handle regardless.
> I think this won't work.  This calls the DuplicateHandle wrapper in ProtocolUtils.cpp, which
> assumes the handle to duplicate is valid in the calling process and passes
> GetCurrentProcess() as the hSourceProcessHandle argument.

Doh. Yeah, you're right. In that case, I wonder if we ever need to use the broker to duplicate the handle. Maybe Bob can answer that question.
Comment on attachment 8685205 [details] [diff] [review]
patch

Review of attachment 8685205 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry it took me a few days to reply, it's quite a few months since I looked at this code and my memory of it has been dulled by repeated abuse from NPAPI and printing code in the meantime. :-)

Hopefully, I've understood things and these comments help.

One thing to remember on Windows is that for child processes (assuming they have a strong enough sandbox) we can only duplicate our own handles out of our process to a target process, via the broker.
Rules have to be added to the policy for different handle types, with separate rules for duplicating to the broker or to other child processes (that the broker knows about).
We can't duplicate handles from other processes into our process at all.

::: ipc/glue/ProtocolUtils.h
@@ +543,5 @@
> +
> +        // We duplicate the descriptor so that our own file descriptor remains
> +        // valid after the write. An alternative would be to set
> +        // aParam.mTransport.mValid to false, but that won't work because aParam
> +        // is const.

Why would we want the file/Transport descriptor to remain valid?

If we're passing over IPC, don't we need this Endpoint to become invalid, similar to the use of Move to manage lifetimes?

(Assuming the endpoint is for a different pid, on Windows the handle in that descriptor would already be invalid for us ... I think.)

Could mValid just be made mutable?

::: ipc/glue/Transport_win.cpp
@@ +52,5 @@
>  }
>  
>  Transport*
>  OpenDescriptor(const TransportDescriptor& aTd, Transport::Mode aMode)
>  {

I think we could assert here that aTd.mPipeProcessId is our pid.

Ah, just noticed this is already done in Bind, but I guess it would catch other uses.

@@ +74,5 @@
> +  if (mServerPipe != INVALID_HANDLE_VALUE) {
> +    HANDLE serverDup;
> +    DWORD access = 0;
> +    DWORD options = DUPLICATE_SAME_ACCESS;
> +    MOZ_RELEASE_ASSERT(DuplicateHandle(aTd.mServerPipe, aTd.mPipeProcessId, &serverDup, access, options));

As Jed points out, this handle has already been duplicated to the parent destination pid (in CreateTransport), so this would only work if that were us, in which case we wouldn't be sending it over IPC.

I should have given this a different name to the real function, sorry.
We should think about doing that.

@@ +91,5 @@
> +  // If mServerPipe is valid, then it's a handle to a pipe in another
> +  // process. We can close it using some weird options for
> +  // DuplicateHandle. (CloseHandle won't work.)
> +  if (aTd.mServerPipe != INVALID_HANDLE_VALUE) {
> +    DuplicateHandle(aTd.mServerPipe, aTd.mPipeProcessId, nullptr, 0, DUPLICATE_CLOSE_SOURCE);

I suppose it all depends on which process is trying to close the descriptor.

I'm not sure I really fully understand this, but it seems to me that the pipe isn't owned by any process, but the handles to it are only valid in a specific process.

So if, aTd.mPipeProcessId is our pid then CloseHandle should work, if not and we're the broker then we could use the real version of DuplicateHandle and DUPLICATE_CLOSE_SOURCE.

If another process needs to do it for some reason then I think we'll need to add some new CloseHandle function to the broker, although how we'd make that safe I'm not sure.
Although I suppose that this would only happen if it was one we had duplicated ourselves (via the broker) or that we had specifically been sent for some bizarre reason.

::: ipc/glue/Transport_win.h
@@ +20,5 @@
>  struct TransportDescriptor
>  {
>    std::wstring mPipeName;
>    HANDLE mServerPipe;
> +  base::ProcessId mPipeProcessId;

Maybe it would be clearer if these were called mServerPipeHandle and mPipeHandleProcessId?
Attachment #8685205 - Flags: review?(bobowen.code) → review-
I guess it's probably not a good idea for the broker to allow other processes to close handles that they don't own. We could have some way of tracking that the closing process was the original creator of those handles, but that would be complicated and maybe still insecure.

Another option would be to try to make Windows work more like POSIX systems here: we wouldn't duplicate the pipe into the receiver process until the time when we actually send it. I'm not sure if this would work. But right now we're in this situation because we basically want to "un-send" the pipe if an error happens, so it would simplify things if we waited to send it until we're certain that we want to send it.

Another option is to basically allow these pipes to leak if an error happens. The only places we call CloseDescriptor are in error handling paths. We could implement this by making the Windows versions of CloseDescriptor and DuplicateDescriptor be empty. (DuplicateDescriptor would return the original descriptor.) This would also have the consequence that if someone created a pair of Endpoints and then never used them (maybe because an error happened) the pipe would leak.

Bob, this is probably your decision. What do you think?
Flags: needinfo?(bobowen.code)
Talked this over with Bob today. We're going to try to make Windows work more like POSIX here. It seems like that should work.
Flags: needinfo?(bobowen.code)
Attached patch patch v2Splinter Review
Bob, here's what we talked about. One thing I realized is that we can still leak a handle, as follows:

- We create the pipe locally.
- We serialize the Endpoint out to a Message via ParamTraits. In doing so, we duplicate the HANDLE into the destination process.
- We try to send the message and fail for some reason.

I don't think this is a huge problem in practice. If we fail to send, it's usually because the other side is dead already. So leaking something there is no problem.

The changes to the Transport code are still valuable, I think, because they ensure that someone who creates an endpoint and then fails to send it won't cause a leak.

I'm a bit concerned about all the release mode assertions on handle operations. The problem is that ParamTraits::Write doesn't allow us to return an error code. Fixing that would be a huge amount of work.

If duplicating the handle locally fails, we're probably fine crashing. But I'm worried that TransferHandleToProcess might legitimately fail if the other process is gone or something. In that case we might be better off ignoring the error and sending an invalid handle. If that handle ever actually arrives anywhere, we could assert that it's valid at that time. Does that sound reasonable Bob?

Jed, I'll ask you for review again once this looks all right to Bob.
Attachment #8685205 - Attachment is obsolete: true
Attachment #8685205 - Flags: review?(jld)
Attachment #8689272 - Flags: review?(bobowen.code)
Comment on attachment 8689272 [details] [diff] [review]
patch v2

Review of attachment 8689272 [details] [diff] [review]:
-----------------------------------------------------------------

Some questions here, but the main problem is still HANDLE closing.
Seems to be a bit of a catch-22 here.

I need to think some more.

::: ipc/glue/ProtocolUtils.h
@@ +544,5 @@
> +        // We duplicate the descriptor so that our own file descriptor remains
> +        // valid after the write. An alternative would be to set
> +        // aParam.mTransport.mValid to false, but that won't work because aParam
> +        // is const.
> +        mozilla::ipc::TransportDescriptor desc = mozilla::ipc::DuplicateDescriptor(aParam.mTransport);

Was there a reason you didn't want to make mValid mutable?
Having the Endpoint become invalid when sent, seems to be the right thing to me.

::: ipc/glue/Transport_win.cpp
@@ +63,5 @@
> +  HANDLE handleDup;
> +  DWORD access = 0;
> +  DWORD options = DUPLICATE_SAME_ACCESS;
> +  bool ok = DuplicateHandle(source, pid, &handleDup, access, options);
> +  MOZ_RELEASE_ASSERT(ok);

I agree that maybe this shouldn't be a release assert here as if the other process has died, we probably wouldn't want to kill this one.

Shame that write doesn't return an error, as you say.

I think you're also correct over you other point about if this works and the actual send fails (presumably because the other process has died suddenly in the interim), then I would think we're OK, because the handle really belongs to the other process and should be cleaned up.

@@ +67,5 @@
> +  MOZ_RELEASE_ASSERT(ok);
> +
> +  // Now close our own copy of the handle (we're supposed to be transferring,
> +  // not copying).
> +  CloseHandle(source);

We could just add DUPLICATE_CLOSE_SOURCE to the options.

@@ +113,5 @@
>  void
>  CloseDescriptor(const TransportDescriptor& aTd)
>  {
> +  // We're closing our own local copy of the pipe.
> +

nit: extra line

Also, if we did think about not duping the descriptor on send (to keep our own copy), we'd always have already closed this handle, I think.

Actually in the existing non-Endpoint, code with TransferHandleToProcess, wouldn't we already have closed it.
Closing twice is bad, AIUI, because of handle reuse.

::: ipc/glue/Transport_win.h
@@ +39,5 @@
>    typedef mozilla::ipc::TransportDescriptor paramType;
>    static void Write(Message* aMsg, const paramType& aParam)
>    {
> +    HANDLE pipe = mozilla::ipc::TransferHandleToProcess(aParam.mServerPipeHandle,
> +                                                        aParam.mDestinationProcessId);

I think we should set mServerPipeHandle to INVALID_HANDLE_VALUE, to prevent double closing.
Ah, we can't because aParam is const, damn.

@@ +54,5 @@
> +    if (!r) {
> +      return r;
> +    }
> +    if (aResult->mServerPipeHandle != INVALID_HANDLE_VALUE) {
> +      MOZ_RELEASE_ASSERT(aResult->mDestinationProcessId == base::GetCurrentProcId());

So, we are saying we would always send a descriptor directly and never via the broker?
Basically the problem is that with the HANDLE initially duped for the current process, for the old way of doing things, we want to always close the handle and for CloseDescriptor to do nothing.
For the new way we only want to close it when CloseDescriptor is called.

I came up with a solution and was about to upload it yesterday, when I noticed what I think is an issue in the existing posix code.

We create the FileDescriptors in CreateTransport as auto_close (i.e. close on sending), but then we still close if there is an error.
It looks to me like once the FileDescriptor has been written it gets added to a FileDescriptorSet and should be closed no matter what (if auto_close is set).
I assume that closing fds twice can also cause problems for posix in a multi-threaded environment.

(As an aside, I noticed that the DuplicateDescriptor implementation for posix, wasn't setting this flag, so we were going to leak in the new world.)

Anyway, after reading [1], I think that this auto_close behaviour, means that we should actually do the duping in the ParamTraits Write for TransportDescriptor and have the original creators of the TransportDesciptors always close their descriptors.
On posix we dup an fd that the IPC machinery will close and on Windows we dup the HANDLE to something that is really up to the other process or OS to close.
I updated the patch as I was thinking it through, so I've uploaded as an example.

I actually think the best way to do this is to make TransportDescriptor take proper ownership of its HANDLE/fd and only allow move, with the HANDLE/fd being set to INVALID_HANDLE_VALUE or -1 to invalidate.
It would then close in its destructor if valid and we could get rid of CloseDescriptor.
OpenDescriptor would become a member on TransportDescriptor, called CreateTransport or something, and would also invalidate the HANDLE/fd as ownership is passed to the Transport.
But that seems a lot to add, so if you like the idea, maybe this could be done in a follow-up at some point.


[1] https://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/ipc_message_utils.h#459
Attachment #8689272 - Flags: review?(bobowen.code)
(In reply to Bob Owen (:bobowen) from comment #9)
> I assume that closing fds twice can also cause problems for posix in a
> multi-threaded environment.

Quite.  Operations that create new file descriptors allocate the lowest unused nonnegative integer, so there's nontrivial risk of closing something else's fd instead of just getting EBADF.  (POSIX specifies this for at least {open, pipe, dup}, and in general it's the traditional behavior.)
Comment on attachment 8691432 [details] [diff] [review]
bug1223240example.patch

Review of attachment 8691432 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Bob. This looks nice to me. Is it okay if I ask Jed to review it?
Attachment #8691432 - Flags: review?(jld)
(In reply to Bill McCloskey (:billm) from comment #11)
> Comment on attachment 8691432 [details] [diff] [review]
> bug1223240example.patch
> 
> Review of attachment 8691432 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks Bob. This looks nice to me. Is it okay if I ask Jed to review it?

Of course, no problem.
Although you need to update the header back to you before landing, looks like I folded your patch into mine at some point.
(I think I started messing with something and then thought it would be easier to do that with your changes in place).
Comment on attachment 8691432 [details] [diff] [review]
bug1223240example.patch

Review of attachment 8691432 [details] [diff] [review]:
-----------------------------------------------------------------

I haven't dealt with top-level protocols much before this bug, so this needed more staring at the code than usual.  Some minor comments about the tests, but the rest of it looks good.

::: ipc/ipdl/test/cxx/PTestEndpointBridgeMain.ipdl
@@ +14,5 @@
> +child:
> +  Start();
> +
> +parent:
> +  Bridged(Endpoint<PTestEndpointBridgeMainSubParent> endpoint);

Nit: There's a state machine declared but this message isn't in it?  Not that it matters much until/unless states are checked, but it seems like a good idea for the tests to set a good example.  I could figure out what was going on in the test by reading through the code, but having the expected behavior specified helps.

@@ +16,5 @@
> +
> +parent:
> +  Bridged(Endpoint<PTestEndpointBridgeMainSubParent> endpoint);
> +
> +  __delete__();

__delete__ on a top-level protocol?  It looks like this isn't used (vs. Close()ing the MessageChannel), and part of lower.py suggests that it wouldn't work anyway ("yet").  Same with the other protocols.

::: ipc/ipdl/test/cxx/PTestEndpointBridgeSub.ipdl
@@ +10,5 @@
> +protocol PTestEndpointBridgeSub {
> +child:
> +  Ping();
> +
> +  Bridged(Endpoint<PTestEndpointBridgeMainSubChild> endpoint);

See above about the state machines.
Attachment #8691432 - Flags: review?(jld) → review+
https://hg.mozilla.org/mozilla-central/rev/742d85ac5b33
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
See Also: → 1374258
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: