When using async launch, the toplevel actor doesn't have a process handle

RESOLVED FIXED

Status

()

Core
IPC
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: Benjamin Smedberg, Assigned: Benjamin Smedberg)

Tracking

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 -, fennec2.0b3+)

Details

Attachments

(2 attachments, 7 obsolete attachments)

(Assignee)

Description

7 years ago
http://mxr.mozilla.org/mozilla-central/source/js/jetpack/JetpackParent.cpp#56

We Launch the process and immediately call GetChildProcessHandle on it. But we don't actually get the child process handle until OnChannelConnected. 

http://mxr.mozilla.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#402

Thoughts? We need this handle for crash reporting in jetpack and content processes.
We just picked up the pid from there for ... historical reasons.  It's no sweat at all to just grab it when we fork the subprocess.  Then the OnConnected() callback can just check that the pids are the same.
(Assignee)

Comment 2

7 years ago
Maybe PToplevel::Open shouldn't take the process handle (since it might not exist yet) and we'll set it later?
(Assignee)

Updated

7 years ago
Assignee: nobody → bhsieh
(Assignee)

Updated

7 years ago
Blocks: 581335
>It's no sweat at all to just grab it when we fork the subprocess.
Not sure I understand this comment. I guess you're referring to the code around here: http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/process_util_linux.cc#54, and similar for other platforms? Or is there more abstracted code to deal with this?

Also, the way for the parent to get the ProcessHandle is to pass the child's pid to OpenPriviledgeProcessHandle, right? But the child still needs to pass the pid to the parent somehow. 

>Maybe PToplevel::Open shouldn't take the process handle (since it might not
> exist yet) and we'll set it later?
I couldn't quite figure out when this was defined, and the debugger took me into generated code. Where should I look to change this?
(Assignee)

Comment 4

7 years ago
PJetpack::Open is generated code, see ipc/ipdl/PJetpackParent.cpp and ipc/ipdl/_ipdlgen/mozilla/jetpack/PJetpackParent.h
(In reply to comment #3)
> >It's no sweat at all to just grab it when we fork the subprocess.
> Not sure I understand this comment. I guess you're referring to the code around
> here:
> http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/process_util_linux.cc#54,
> and similar for other platforms? Or is there more abstracted code to deal with
> this?
> 

I just meant that fork() and CreateProcess() both return the process identifier we want.  I don't recall if the chromium code saves those away, but if not, it's not hard to do.
So, I've read through the code and also run it through a debugger. As far as I can tell, the sequence of events is this:

Main thread:
1) JetpackProcessParent/GeckoChildProcessHost's constructor passes an InitializeChannel() call to the io thread.
2) JetpackParent does JetpackProcessParent->Launch(), which does asyncLaunch(), which passes a PerformAsyncLaunch() call to the io thread.

then, on IO thread:
3) InitializeChannel() is called, setting up the channel, flipping mChannelInitialized in GeckoChildProcessHost, and unblocking the asyncLaunch() call in the main thread.
4) PerformAsyncLaunch() calls LaunchApp(), which actually does the fork. The processID is now available.

Later,
5) OnChannelConnected() is called with the pid, not sure which thread. This is also known as the HELLO message, I believe.
---

All of that is to say, I don't think the process identifier is available when Launch() / AsyncLaunch() returns. (Though it's not totally clear to me how the channel gets set up before the process is forked.) I'll probably go with bsmedberg's suggestion to just set the process handle when OnChannelConnected() is called. This will require JetpackProcessParent to hold a reference to JetpackParent, though -- or maybe more generally, require the GeckoProcessParent to hold a reference to the top-level actor. Currently JetpackParent owns JetpackProcessParent, and I don't think there's a backwards reference.

Would appreciate feedback on this, especially if I've gotten anything confused. Also, whether I should do a specific solution for JetpackProcessParent to hold a reference to JetpackParent, or the more general GeckoProcessParent holding a reference to the top-level actor (is this possible?)
Created attachment 476461 [details] [diff] [review]
WIP

With this patch, the processID is retrieved right after the LaunchApp() call returns (where the fork() happens). 

If this approach is okay, probably I want to modify the template for a top-level actor so that it expects to have a reference to a GeckoChildProcessHost (I think all the derived classes do, anyways), and do the OtherProcess() change for all top-level actors instead of just for JetpackParent::OtherProcess() like I have it now. Also, I'd remove the ProcessHandle argument from the Open() call.
Created attachment 476462 [details] [diff] [review]
WIP with function names

For easier reading. bsmedberg, cjones, what do you think?
Attachment #476461 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Attachment #476462 - Flags: review?(jones.chris.g)
blocking2.0: --- → ?
(Assignee)

Updated

7 years ago
Blocks: 578685
Comment on attachment 476462 [details] [diff] [review]
WIP with function names

>diff --git a/ipc/glue/GeckoChildProcessHost.cpp b/ipc/glue/GeckoChildProcessHost.cpp
>--- a/ipc/glue/GeckoChildProcessHost.cpp
>+++ b/ipc/glue/GeckoChildProcessHost.cpp
>@@ -436,16 +436,18 @@ GeckoChildProcessHost::PerformAsyncLaunc
> #else
> #  error Sorry
> #endif
> 
>   if (!process) {
>     return false;
>   }
>   SetHandle(process);
>+  mChildProcessHandle = process;
>+
> #if defined(XP_MACOSX)
>   mChildTask = child_task;
> #endif
> 
>   return true;
> }
> 
> void

This is really the meat of the patch.  The other part that's missing is to have process_util_win.cc get a privileged handle back from CreateProcess(), and we need to ditch the code in GeckoChildProcessHost that does that.  It'd also be nice to double-check that the process ID we back in GeckoChildProcessHost::OnChannelConnected is the same as GetProcessId(mChildProcessHandle).

You can delete the rest of the patch.  It's best to do this work in GeckoChildProcessHost so we don't have to duplicate it for PContent, PCompositor, ...
Created attachment 478481 [details] [diff] [review]
asynclaunch blocks until after fork
Attachment #476462 - Attachment is obsolete: true
Attachment #478481 - Flags: review?(jones.chris.g)
Attachment #476462 - Flags: review?(jones.chris.g)
Created attachment 478484 [details] [diff] [review]
WIP test

to test the above patch. right now it seems to hang after appearing to pass. seems like some sort of timing issue / race condition (maybe between asyncLaunch() and the destructor?), because printfs make the issue disappear.
Backtraces from the hang would aid in debugging.
Thread 1: 
#0  0x00007ff0585f9cfd in pthread_join (threadid=140670041954576, 
    thread_return=0x0) at pthread_join.c:89
#1  0x00007ff056bde583 in PlatformThread::Join (thread_handle=140670041954576)
    at /home/bhsieh/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:119
#2  0x00007ff056baa208 in base::Thread::Stop (this=0x13dd040)
    at /home/bhsieh/mozilla-central/ipc/chromium/src/base/thread.cc:114
#3  0x00007ff0568e1eb2 in ~BrowserProcessSubThread (this=0x13dd040, 
    __in_chrg=<value optimized out>)
    at /home/bhsieh/mozilla-central/ipc/glue/BrowserProcessSubThread.cpp:89
#4  0x00007ff056aa7861 in mozilla::ShutdownXPCOM (servMgr=0x0)
    at /home/bhsieh/mozilla-central/xpcom/build/nsXPComInit.cpp:751
#5  0x00007ff056aa7159 in NS_ShutdownXPCOM_P (servMgr=0x0)
    at /home/bhsieh/mozilla-central/xpcom/build/nsXPComInit.cpp:587
#6  0x00007ff055380396 in XRE_TermEmbedding ()
    at /home/bhsieh/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:223
#7  0x00007ff0568ef53e in mozilla::ipc::ScopedXREEmbed::Stop (
    this=0x7ffff73178bf)
    at /home/bhsieh/mozilla-central/ipc/glue/ScopedXREEmbed.cpp:100
#8  0x00007ff0568ef2e0 in ~ScopedXREEmbed (this=0x7ffff73178bf, 
    __in_chrg=<value optimized out>)
    at /home/bhsieh/mozilla-central/ipc/glue/ScopedXREEmbed.cpp:60
#9  0x00007ff055380f38 in XRE_InitParentProcess (aArgc=1, 
---Type <return> to continue, or q <return> to quit---
    aArgv=0x7ffff7317a28, 
    aMainFunction=0x7ff056a711ec <mozilla::_ipdltest::IPDLUnitTestMain(void*)>, aMainFunctionData=0x7ffff731852e)
    at /home/bhsieh/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:599
#10 0x00007ff055380fc1 in XRE_RunIPDLTest (aArgc=1, aArgv=0x7ffff7317a28)
    at /home/bhsieh/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:618
#11 0x00000000004006e1 in main (argc=2, argv=0x7ffff7317a28)
    at /home/bhsieh/mozilla-central/ipc/ipdl/test/cxx/app/TestIPDL.cpp:51


Thread 2:

#0  0x00007ff058600cdd in __libc_waitpid (pid=3584, 
    stat_loc=<value optimized out>, options=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:41
#1  0x00007ff056bf383c in WaitForChildExit (this=0x174dca0)
    at /home/bhsieh/mozilla-central/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc:112
#2  0x00007ff056bf3d69 in WillDestroyCurrentMessageLoop (this=0x174dca0)
    at /home/bhsieh/mozilla-central/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc:202
#3  0x00007ff056b82cc1 in ~MessageLoop (this=0x7ff04be18e20, 
    __in_chrg=<value optimized out>)
    at /home/bhsieh/mozilla-central/ipc/chromium/src/base/message_loop.cc:138
#4  0x00007ff056baa4ae in base::Thread::ThreadMain (this=0x13dd040)
    at /home/bhsieh/mozilla-central/ipc/chromium/src/base/thread.cc:166
#5  0x00007ff056bde377 in ThreadFunc (closure=0x13dd040)
    at /home/bhsieh/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:26
#6  0x00007ff0585f8a04 in start_thread (arg=<value optimized out>)
    at pthread_create.c:300
#7  0x00007ff0539f4d4d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8  0x0000000000000000 in ?? ()
Created attachment 478850 [details] [diff] [review]
WIP test

Realized I forgot to hg add before uploading the test patch. I think the interesting bits are in TestAsyncProcessIDParent::Main(), mainly.
Attachment #478484 - Attachment is obsolete: true
Created attachment 479218 [details] [diff] [review]
asynclaunch blocks until after fork, plus tests

The SyncHang test had to be modified because it was wrong, but for some reason (timing?) I didn't notice until I had made these changes as well. Anyways, this works now.

Worth noting that having a mChildProcessHandle > 0 no longer means that the process actually successfully launched (to the point of sending the HELLO message). For example, before this patch, if you launch a plugin process with an invalid path the handle would be 0, but with this patch the handle will be > 0 although mLaunched will be false. GeckoChildProcessHost destructor was changed to take that into account (that's one of the things that was causing the hang above), but I don't know if that assumption is used elsewhere.
Attachment #478481 - Attachment is obsolete: true
Attachment #478850 - Attachment is obsolete: true
Attachment #479218 - Flags: review?(jones.chris.g)
Attachment #478481 - Flags: review?(jones.chris.g)
Comment on attachment 479218 [details] [diff] [review]
asynclaunch blocks until after fork, plus tests

bsmedberg, are you familiar with the behavior of CreateProcess()?
This patch relies on fork() not doing IO; I had assumed
CreateProcess() was similar internally.  MSDN says

  Note that the function returns before the process has finished
  initialization. If a required DLL cannot be located or fails to
  initialize, the process is terminated. To get the termination status
  of a process, call GetExitCodeProcess.

but that's not very helpful.

>diff --git a/dom/plugins/PluginModuleChild.cpp b/dom/plugins/PluginModuleChild.cpp
>--- a/dom/plugins/PluginModuleChild.cpp
>+++ b/dom/plugins/PluginModuleChild.cpp
>@@ -179,16 +179,19 @@ PluginModuleChild::Init(const std::strin
> 
>     nsCString filename;
>     filename = aPluginFilename.c_str();
>     nsCOMPtr<nsILocalFile> pluginFile;
>     NS_NewNativeLocalFile(filename,
>                           PR_TRUE,
>                           getter_AddRefs(pluginFile));
> 
>+    if (!pluginFile)
>+        return false;

We don't want to "silently" fail here.  If we can't get the
pluginFile, we need to invoke breakpad.  One way would be to
NS_RUNTIMEABORT(), but since we deref pluginFile in the next statement
executed, I would just leave things as they are.

>diff --git a/ipc/chromium/src/base/process_util_win.cc b/ipc/chromium/src/base/process_util_win.cc
>--- a/ipc/chromium/src/base/process_util_win.cc
>+++ b/ipc/chromium/src/base/process_util_win.cc
>@@ -170,21 +170,22 @@ bool LaunchApp(const std::wstring& cmdli
>                     process_info.dwProcessId);
> 
>   // Handles must be closed or they will leak
>   CloseHandle(process_info.hThread);
> 
>   if (wait)
>     WaitForSingleObject(process_info.hProcess, INFINITE);
> 
>-  // If the caller wants the process handle, we won't close it.
>+  CloseHandle(process_info.hProcess);
>+
>+  // If the caller wants the process handle, open a privileged one here.
>   if (process_handle) {
>-    *process_handle = process_info.hProcess;
>-  } else {
>-    CloseHandle(process_info.hProcess);
>+    if (!OpenPrivilegedProcessHandle(process_info.dwProcessId, process_handle))
>+      NS_RUNTIMEABORT("can't open handle to child process");

Please make this behavior selectable with a switch like |bool
open_privileged_handle|, defaulting to |false|.

>diff --git a/ipc/glue/GeckoChildProcessHost.cpp b/ipc/glue/GeckoChildProcessHost.cpp
>--- a/ipc/glue/GeckoChildProcessHost.cpp
>+++ b/ipc/glue/GeckoChildProcessHost.cpp
>@@ -102,17 +102,17 @@ GeckoChildProcessHost::GeckoChildProcess
> 
> GeckoChildProcessHost::~GeckoChildProcessHost()
> 
> {
>   AssertIOThread();
> 
>   MOZ_COUNT_DTOR(GeckoChildProcessHost);
> 
>-  if (mChildProcessHandle > 0)
>+  if (mChildProcessHandle > 0 && mLaunched)
>     ProcessWatcher::EnsureProcessTerminated(mChildProcessHandle

No, we still need to clean up the subprocess even if |!mLaunched|.

> void
> GeckoChildProcessHost::OnChannelConnected(int32 peer_pid)
> {
>   MonitorAutoEnter mon(mMonitor);
>   mLaunched = true;
> 
>-  if (!base::OpenPrivilegedProcessHandle(peer_pid, &mChildProcessHandle))
>-      NS_RUNTIMEABORT("can't open handle to child process");
>+  NS_ASSERTION(peer_pid == base::GetProcId(mChildProcessHandle),
>+               "unexpected pid");
> 

Make this NS_ABORT_IF_FALSE() please.

Didn't review the test yet, pending CreateProcess() question and other
comments.
Attachment #479218 - Flags: review?(benjamin)
> Please make this behavior selectable with a switch like |bool
> open_privileged_handle|, defaulting to |false|.

Otherwise keep the default of 
>*process_handle = process_info.hProcess; ?

Or otherwise don't set process_handle at all? In that case, we could signal not to do the OpenPrivilegedProcessHandle() just by passing in a null pointer.
(Assignee)

Comment 18

7 years ago
I believe that CreateProcess finds the .exe file and checks to see if it is actually an executable program before returning. It doesn't wait for dependent DLLs to be found or any static initializers to run. I don't know whether it blocks on any UAC dialogs or other intermediate steps that might be part of creating a process.

So there is some I/O blocking as part of the call, I'm pretty sure.

I'm not sure why we can't just launch the process asynchronously and then communicate the handle back through a callback later, but I think I missed an IRC conversation about it.
It's definitely possible, just requires a bit more code. Right now each top-level actor has a GeckoChildProcessHost, so we could just change each actor to query its GCPH when it wants to know the pid of the other process (probably by overriding the virtual method in each of them rather than changing the code generation). The current patch contains the changes to GCPH, rather than requiring changes to each top-level actor, so that's nice.

Originally I was working on a patch to move the query-the-GCPH-for-pid code into the generated code, but I was told that in the future there could be actors without GCPHs.
OK, CreateProcess() behavior makes things harder.

Benedict, let's back up to attachment 476462 [details] [diff] [review], but take a slightly different tack.  Top-level actors will want a notification on process launch anyway, so let's build the process setter on top of that.
 - add AsyncListener::OnChannelConnected (and to SyncListener/RPCListener)
 - have AsyncChannel::OnChannelConnected (which runs on the IO thread) fire a task to the main thread that calls mListener->OnChannelConnected
 - add a virtual, default-no-op ToplevelActor::ChannelConnected interface, call that from OnChannelConnected
 - your JetpackParent would override ChannelConnected to be notified

http://hg.mozilla.org/mozilla-central/rev/2b2f565d7c8f is an example of how to add this ipc/glue+IPDL goop.

Then, add ToplevelActor::SetOtherSide() or something, can call that from ChannelConnected().
(Assignee)

Comment 21

7 years ago
I think this blocks Fennec and future multi-process Jetpack, but not FF4 final.
blocking2.0: ? → -
tracking-fennec: --- → ?
Created attachment 481391 [details] [diff] [review]
sets processID with callback

No test yet, though. I did run it through a debugger to make sure the event was firing and setting the member variable correctly.
Attachment #479218 - Attachment is obsolete: true
Attachment #479218 - Flags: review?(jones.chris.g)
Attachment #479218 - Flags: review?(benjamin)
Attachment #481391 - Flags: review?(jones.chris.g)

Updated

7 years ago
tracking-fennec: ? → 2.0+
Comment on attachment 481391 [details] [diff] [review]
sets processID with callback

>diff --git a/ipc/glue/AsyncChannel.cpp b/ipc/glue/AsyncChannel.cpp
>
>+AsyncChannel::FireOnChannelConnected(int32 peer_pid)

Nit: s/Fire/Dispatch/.

>diff --git a/ipc/ipdl/ipdl/lower.py b/ipc/ipdl/ipdl/lower.py
>+        # OnChannelConnected()
>+        onconnected = MethodDefn(MethodDecl('OnChannelConnected'))
>+        # FIXME: Do we need runtimeAbort for non-toplevel as above?

Yes.

Looks good, let's get a test and all ready to go :).
Attachment #481391 - Flags: review?(jones.chris.g) → review+
(Assignee)

Updated

7 years ago
Blocks: 602891
Created attachment 481972 [details] [diff] [review]
fixes windows build failure

Not totally sure about the CloseProcessHandle() stuff, but this builds on Windows now. Still no test.
Attachment #481391 - Attachment is obsolete: true
Assignee: bhsieh → benjamin
tracking-fennec: 2.0+ → 2.0b3+
Created attachment 491055 [details] [diff] [review]
Above, unrotted + PContent glue

I see this in the log

In file included from ../../dist/include/mozilla/dom/ContentParent.h:45,
                 from /home/cjones/mozilla/mozilla-central/toolkit/xre/nsAppRunner.cpp:65:
../../dist/include/mozilla/ipc/RPCChannel.h:87: warning: ‘virtual void mozilla::ipc::RPCChannel::RPCListener::OnChannelConnected(int32)’ was hidden
../../ipc/ipdl/_ipdlheaders/mozilla/dom/PContentParent.h:347: warning:   by ‘void mozilla::dom::PContentParent::OnChannelConnected()’

which seems to imply that the decl is wrong.  But the ContentParent::OnChannelConnected method was being invoked in gdb, so this patch should be OK for testing.
Blocks: 605832
(Assignee)

Comment 26

7 years ago
http://hg.mozilla.org/mozilla-central/rev/a450d42197b2 (not the content-process bits, I'll take care of that in bug 581335).

This is tested via the test_jetpack_crash.js tests.
(Assignee)

Updated

7 years ago
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Depends on: 613338
You need to log in before you can comment on or make changes to this bug.