Closed Bug 528146 Opened 10 years ago Closed 10 years ago

Run plug-in code on the thread that starts in main()

Categories

(Core :: IPC, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: cjones, Assigned: BenWa)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 7 obsolete files)

We've gotten away with running XPCOM off the real main thread on windows and linux, but not on Mac.  One option is to hack AppShell to init what we need on the main thread.  Another option is just to run on the main thread.  The latter seems best in the long term.
Blocks: 546775
Chris - can you explain how not having XPCOM in child processes any more impacts this work? What do we need to do here to make the Mac OS X situation work?
It probably will make it easier to flip the IO (currently main()) threads and the subprocess "main" thread.  We now just need to fiddle with thread initialization and shutdown in subprocesses.

I don't think there's any additional work we need specifically for OS X; chromium already provides an event loop impl for OS X.  Once we get the IO/"main" threads flipped, OS X IPC should Just Work.
Assignee: nobody → bgirard
Assignee: bgirard → joshmoz
I changed the description of the bug since I believe it better reflect the change we want to make. Is this change needed only on OS X or should this be done for all platforms?
Summary: Run XPCOM on the "main()" thread in subprocesses → Run the plug-in's "main()" thread in thread 0
What we really want is to "Run plug-in code on the thread that starts in main()".

I think we should do this for all platforms.  It's only a matter of time before we run into another platform library that expects to be above main().
Summary: Run the plug-in's "main()" thread in thread 0 → Run plug-in code on the thread that starts in main()
Attached patch Work in progress (obsolete) — Splinter Review
Here is what I have so far (I'll fix up the style before review).

I believe the problem is with my MessageLoop setup. They both end up blocked waiting and the browser hangs. Here is the GDB stacktrace of both threads. 

Perhaps I am misunderstanding how the MessageLoop are supposed to be setup. If everything appears to be setup correctly let me know and I'll keep debugging.
(gdb) where
#0  0x98e0c262 in __semwait_signal ()
#1  0x98e0bf1e in _pthread_cond_wait ()
#2  0x98e0dbb8 in pthread_cond_wait$UNIX2003 ()
#3  0x0373b8f1 in PR_WaitCondVar (cvar=0x4101e40, timeout=4294967295) at /Users/mozilla/mozilla/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:417
#4  0x01352812 in mozilla::CondVar::Wait (this=0xbfffe574, interval=4294967295) at BlockingResourceBase.cpp:373
#5  0x0128f3da in mozilla::ipc::AsyncChannel::Open (this=0xbfffe558, aTransport=0x4101ea0, aIOLoop=0xbfffe7cc) at /Users/mozilla/mozilla/mozilla-central/ipc/glue/AsyncChannel.cpp:147
#6  0x012ff3ee in mozilla::plugins::PPluginModuleChild::Open (this=0xbfffe550, aTransport=0x4101ea0, aOtherProcess=54330, aThread=0xbfffe7cc) at PPluginModuleChild.cpp:92
#7  0x01277edc in mozilla::plugins::PluginModuleChild::Init (this=0xbfffe550, aPluginFilename=@0xbfffe7ac, aParentProcessHandle=54330, aIOLoop=0xbfffe7cc, aChannel=0x4101ea0) at /Users/mozilla/mozilla/mozilla-central/dom/plugins/PluginModuleChild.cpp:158
#8  0x0003edbe in PluginModuleManager::PluginModuleManager (this=0xbfffe4f4, aPluginFilename=@0xbfffe7ac, aParentProcessHandle=54330, aIOLoop=0xbfffe7cc) at /Users/mozilla/mozilla/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:289
#9  0x0003db61 in XRE_InitChildProcess (aArgc=2, aArgv=0xbfffe990, aProcess=GeckoProcessType_Plugin) at /Users/mozilla/mozilla/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:407
#10 0x00001df1 in main (argc=4, argv=0xbfffe990) at /Users/mozilla/mozilla/mozilla-central/ipc/app/MozillaRuntimeMain.cpp:87
#11 0x00001d82 in start ()
(gdb) thread 3
[Switching to thread 3 (process 54349)]
0x98dde2fa in mach_msg_trap ()
(gdb) where
#0  0x98dde2fa in mach_msg_trap ()
#1  0x98ddea67 in mach_msg ()
#2  0x9424e00f in __CFRunLoopRun ()
#3  0x9424d0f4 in CFRunLoopRunSpecific ()
#4  0x9424cf21 in CFRunLoopRunInMode ()
#5  0x9061f380 in -[NSRunLoop(NSRunLoop) runMode:beforeDate:] ()
#6  0x014a86b4 in base::MessagePumpNSRunLoop::DoRun (this=0x4100be0, delegate=0xb0184e64) at /Users/mozilla/mozilla/mozilla-central/ipc/chromium/src/base/message_pump_mac.mm:650
#7  0x014a8ae9 in base::MessagePumpCFRunLoopBase::Run (this=0x4100be0, delegate=0xb0184e64) at /Users/mozilla/mozilla/mozilla-central/ipc/chromium/src/base/message_pump_mac.mm:213
#8  0x0143dd5e in MessageLoop::RunInternal (this=0xb0184e64) at /Users/mozilla/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:216
#9  0x0143dd75 in MessageLoop::RunHandler (this=0xb0184e64) at /Users/mozilla/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:199
#10 0x0143ddd9 in MessageLoop::Run (this=0xb0184e64) at /Users/mozilla/mozilla/mozilla-central/ipc/chromium/src/base/message_loop.cc:173
#11 0x0128e476 in mozilla::plugins::IOThreadChild::Init (this=0x41009a0) at /Users/mozilla/mozilla/mozilla-central/dom/plugins/IOThreadChild.cpp:73
#12 0x0146181d in base::Thread::ThreadMain (this=0x41009a8) at /Users/mozilla/mozilla/mozilla-central/ipc/chromium/src/base/thread.cc:156
#13 0x01490946 in ThreadFunc (closure=0x41009a8) at /Users/mozilla/mozilla/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:26
#14 0x98e0ba19 in _pthread_start ()
#15 0x98e0b89e in thread_start ()
(gdb)
Attachment #441095 - Flags: feedback?(jones.chris.g)
Comment on attachment 441095 [details] [diff] [review]
Work in progress

I was passing the wrong loop when setting up the PluginModuleChild. The plug-in is running now but there are some assertions firing now.
Attachment #441095 - Attachment is obsolete: true
Attachment #441095 - Flags: feedback?(jones.chris.g)
I am assigning this to myself since I have what appears to be a working patch. I will post in once I have cleaned it up.
Assignee: joshmoz → bgirard
Blocks: 558456
Status: NEW → ASSIGNED
Blocks: 559258
Attached patch Patch v1 (obsolete) — Splinter Review
Attached patch patch v2 (obsolete) — Splinter Review
Removed a #define.
Attachment #441577 - Flags: review?
Attachment #441573 - Attachment is obsolete: true
Attachment #441577 - Flags: review? → review?(jones.chris.g)
Latest patch doesn't compile for me on linux

nsX11ErrorHandler.cpp
/home/cjones/mozilla/mozilla-central/toolkit/xre/nsX11ErrorHandler.cpp:42:47: error: mozilla/plugins/PluginThreadChild.h: No such file or directory
/home/cjones/mozilla/mozilla-central/toolkit/xre/nsX11ErrorHandler.cpp:43: error: ‘mozilla’ has not been declared
/home/cjones/mozilla/mozilla-central/toolkit/xre/nsX11ErrorHandler.cpp: In function ‘int X11Error(Display*, XErrorEvent*)’:
/home/cjones/mozilla/mozilla-central/toolkit/xre/nsX11ErrorHandler.cpp:160: error: ‘PluginThreadChild’ has not been declared
make[5]: *** [nsX11ErrorHandler.o] Error 1
make[5]: *** Waiting for unfinished jobs....
/home/cjones/mozilla/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp: In function ‘nsresult XRE_InitChildProcess(int, char**, GeckoProcessType)’:
/home/cjones/mozilla/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:395: error: ‘S’ was not declared in this scope
/home/cjones/mozilla/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:396: error: expected ‘;’ before ‘if’
make[5]: *** [nsEmbedFunctions.o] Error 1
make[4]: *** [libs] Error 2
make[3]: *** [libs_tier_platform] Error 2
make[2]: *** [tier_platform] Error 2
make[1]: *** [default] Error 2
make: *** [build] Error 2
Yup, I introduced a typo in it when removed the ifdef (stupid vim :( ). I have a version without the typo on the try server right now.
Attached patch patch v3 (obsolete) — Splinter Review
Fixed a vim typo.
Attachment #441577 - Attachment is obsolete: true
Attachment #441637 - Flags: review?(jones.chris.g)
Attachment #441577 - Flags: review?(jones.chris.g)
Attachment #441637 - Attachment is patch: true
Attachment #441637 - Attachment mime type: application/octet-stream → text/plain
There is still the following build error only occurring on Linux I will need to fix:

/builds/slave/sendchange-linux-unittest/mozilla/toolkit/xre/nsX11ErrorHandler.cpp:42:47: error: mozilla/plugins/PluginThreadChild.h: No such file or directory
/builds/slave/sendchange-linux-unittest/mozilla/toolkit/xre/nsX11ErrorHandler.cpp:43: error: ‘mozilla’ has not been declared
/builds/slave/sendchange-linux-unittest/mozilla/toolkit/xre/nsX11ErrorHandler.cpp: In function ‘int X11Error(Display*, XErrorEvent*)’:
/builds/slave/sendchange-linux-unittest/mozilla/toolkit/xre/nsX11ErrorHandler.cpp:160: error: ‘PluginThreadChild’ has not been declared
I'll wait to review until the linux error is fixed, because I want to test locally.
This is really needed for Qt plugins, because Qt is not working correctly in non-main thread :(
(In reply to comment #15)
> This is really needed for Qt plugins, because Qt is not working correctly in
> non-main thread :(

I'm still working on this and I've made some progress. However I heard that Qt may use Quickdraw which we don't support out of process so we will not be able to run Qt out of process.
> 
> I'm still working on this and I've made some progress. However I heard that Qt
> may use Quickdraw which we don't support out of process so we will not be able

QuickDraw? I'm doing only Linux Qt port now... 

> to run Qt out of process.
Ohh Linux. Quickdraw is an issue with Qt on OSX, I didn't know other platforms were blocking on it. 

I have the patch working but it prevents the browser from shutting down correctly so I can't check it in yet. If you want to work on the Qt port I can post my patch you'll just have to ignore the shutdown issue.
would be nice.
Attached patch patch v4 (obsolete) — Splinter Review
I tried this patch on win/osx/linux with flash and everything runs good. I am still looking into the shutdown problem that are making the tests timeout. I'm tracing through the shutdown right now.
Attachment #441637 - Attachment is obsolete: true
Attachment #441637 - Flags: review?(jones.chris.g)
Attached patch patch v5Splinter Review
This patch passes tryserver.
Attachment #443886 - Attachment is obsolete: true
Attachment #444603 - Flags: review?
Attachment #444603 - Flags: review? → review?(jones.chris.g)
Comment on attachment 444603 [details] [diff] [review]
patch v5

On the whole this looks good, great job!  I know this couldn't have
been a fun patch to write ;).


>diff --git a/dom/plugins/IOThreadChild.cpp b/dom/plugins/IOThreadChild.cpp

Nit: this should live in ipc/glue, because it's not specific to OOPP.

>new file mode 100644
>--- /dev/null
>+++ b/dom/plugins/IOThreadChild.cpp
>+namespace mozilla {
>+namespace plugins {
>+
>+IOThreadChild::IOThreadChild(ProcessHandle aParentHandle) :
>+    MozillaChildThread(aParentHandle, MessageLoop::TYPE_IO) 
>+    , mIOLoop(NULL)
>+    , mIsInit(false)
>+{
>+    mMonitor = PR_NewMonitor();

Just for future reference, the classical re-entrant monitor
synchronization primitive is deprecated in mozilla code, and direct
use of PRMonitor* is doubly deprecated ;).  mozilla::Mutex/CondVar is
the wait/signal primitive of choice.  (Eventually we're going to make
mozilla::Monitor be a non-reentrant Mutex/CondVar wrapper, and keep
mozilla::ReentrantMonitor just for old code.)

That said, you don't need to do this synchronization; chromium code
does it for you.  See [1], in particular the |#ifdef
CHROMIUM_MOZILLA_BUILD| bit.  We turned off the synchronization [2]
because (IIRC) it caused a circular dependency between the IO and
"plugin" threads being initialized (that was a long time ago, though).
Making the IO thread not be the thread below main(), as your patch
does, should fix that, so we can turn the chromium sync back on.

That also said, I like this IOThreadChild class.  Note that the
browser-process code here [3] is doing essentially the same thing as
this class.  If you're feeling really really really frisky, you could
take the opportunity to unify them.  But that's certainly enough work
for a followup bug.

[1] http://hg.mozilla.org/mozilla-central/file/a908be341373/ipc/chromium/src/base/thread.cc#l138
[2] http://hg.mozilla.org/mozilla-central/file/a908be341373/ipc/glue/MozillaChildThread.h#l65
[3] http://hg.mozilla.org/mozilla-central/file/a908be341373/xpcom/build/nsXPComInit.cpp#l506

>diff --git a/dom/plugins/IOThreadChild.h b/dom/plugins/IOThreadChild.h
>new file mode 100644
>--- /dev/null
>+++ b/dom/plugins/IOThreadChild.h
>+// The IOThreadChild class represents a background thread where IPC IO MessageLoop
>+// live.
>+class IOThreadChild : public mozilla::ipc::MozillaChildThread {
>+public:
>+    IOThreadChild(ProcessHandle aParentHandle);
>+    ~IOThreadChild();
>+
>+    MessageLoop* GetIOLoop();
>+private:
>+    // Thread implementation:
>+    virtual void Init();
>+    virtual void CleanUp();
>+
>+    volatile MessageLoop* mIOLoop;
>+    volatile bool         mIsInit;
>+    PRMonitor* mMonitor;
>+

I don't see where these members are used, and the chromium thread
classes already have them, so they can be removed.

>diff --git a/dom/plugins/PluginThreadChild.h b/dom/plugins/PluginThreadChild.h
>--- a/dom/plugins/PluginThreadChild.h
>+++ b/dom/plugins/PluginThreadChild.h
> // The PluginThreadChild class represents a background thread where plugin instances
> // live.
>-class PluginThreadChild : public mozilla::ipc::MozillaChildThread {
>+class PluginThreadChild : private ChildThread {

That comment isn't true anymore, and this isn't really a "thread"
anymore in the not-main() sense.  We should remove this *ThreadChild
pattern in favor of *Process(Child|Parent) classes (or something
similar).  This will make a *lot* of upcoming code cleaner: IPDL
tests, jetpack, compositor, and so on.

I think it's iffy to have IPDL unit tests and plugins share the same
PluginThreadChild class.  For one, the IPDLUnitTestChildThread* code
in ipc/ipdl/test/cxx is now dead code.  Instead, I'd like to have
PluginProcessChild and IPDLUnitTestProcessChild classes, and keep
common code in MozillaProcessChild (from which Plugin*Child and
IPDL*Child would inherit).

>diff --git a/ipc/chromium/src/chrome/common/ipc_channel_win.cc b/ipc/chromium/src/chrome/common/ipc_channel_win.cc
>--- a/ipc/chromium/src/chrome/common/ipc_channel_win.cc
>+++ b/ipc/chromium/src/chrome/common/ipc_channel_win.cc
>@@ -80,6 +80,9 @@
>     output_queue_.pop();
>     delete m;
>   }
>+
>+  if (thread_check_.get())
>+    thread_check_.reset();
> }

What's this for?

> 
> bool Channel::ChannelImpl::Send(Message* message) {
>diff --git a/ipc/chromium/src/chrome/common/ipc_channel_win.h b/ipc/chromium/src/chrome/common/ipc_channel_win.h
>--- a/ipc/chromium/src/chrome/common/ipc_channel_win.h
>+++ b/ipc/chromium/src/chrome/common/ipc_channel_win.h
>@@ -20,7 +20,11 @@
>  public:
>   // Mirror methods of Channel, see ipc_channel.h for description.
>   ChannelImpl(const std::wstring& channel_id, Mode mode, Listener* listener);
>-  ~ChannelImpl() { Close(); }
>+  ~ChannelImpl() { 
>+    if (pipe_ != INVALID_HANDLE_VALUE) {
>+      Close();
>+    }
>+  }

This too?

>diff --git a/toolkit/xre/nsEmbedFunctions.cpp b/toolkit/xre/nsEmbedFunctions.cpp
>--- a/toolkit/xre/nsEmbedFunctions.cpp
>+++ b/toolkit/xre/nsEmbedFunctions.cpp
> nsresult
> XRE_InitChildProcess(int aArgc,
>                      char* aArgv[],
>@@ -332,39 +336,49 @@
>     return NS_ERROR_FAILURE;
>   }
> 
>-  MessageLoopForIO mainMessageLoop;
>+  // Associate this thread with a UI MessageLoop
>+  MessageLoopForUI uiMessageLoop;
>+  uiLoop = &uiMessageLoop;
>+  {
>+    IOThreadChild* ioThread = new IOThreadChild(parentHandle);
>+    if (!ioThread) {
>+      NS_LogTerm();
>+      return NS_ERROR_FAILURE;
>+    }

|new Foo()| is infallible, no need to check for NULL.

> 
>-  {
>-    ChildThread* mainThread;
>+    // Run IPC IO Message loop in the background.
>+    ChildProcess process(ioThread);
>+    sIOMessageLoop = ioThread->GetIOLoop();
>+    if (!sIOMessageLoop) {
>+      NS_LogTerm();
>+      return NS_ERROR_FAILURE;
>+    }
> 
>     switch (aProcess) {
>     case GeckoProcessType_Default:
>       NS_RUNTIMEABORT("This makes no sense");
>       break;
> 
>-    case GeckoProcessType_Plugin:
>-      mainThread = new PluginThreadChild(parentHandle);
>+#ifndef MOZ_IPDL_TESTS
>+    case GeckoProcessType_IPDLUnitTest:
>+      NS_RUNTIMEABORT("rebuild with --enable-ipdl-tests");
>       break;
>+#else 
>+    case GeckoProcessType_IPDLUnitTest:
>+#endif
>+    case GeckoProcessType_Plugin: {
>+      PluginThreadChild plugin(parentHandle, aProcess);
> 
>-    case GeckoProcessType_IPDLUnitTest:
>-#ifdef MOZ_IPDL_TESTS
>-      mainThread = new IPDLUnitTestThreadChild(parentHandle);
>-#else
>-      NS_RUNTIMEABORT("rebuild with --enable-ipdl-tests");
>-#endif

This is where I'd like to see something like

  switch(type) {
  case IPDLTest:
    IPDLTestProcessChild::Init();
    break;

  case Plugin:
    PluginProcessChild::Init();
    break;
  //...

>@@ -492,7 +506,8 @@
>   MessageLoop* ioLoop = XRE_GetIOMessageLoop();
>   NS_ABORT_IF_FALSE(!!ioLoop, "Bad shutdown order");
> 
>-  ioLoop->PostTask(FROM_HERE, new MessageLoop::QuitTask());
>+  //uiLoop->Quit(); 
>+  uiLoop->PostTask(FROM_HERE, new MessageLoop::QuitTask());

Should use |uiLoop->Quit();| here; we previously had to post the
QuitTask to |ioLoop| because it lived on a different thread than the
one from which this function is called.  Also, it might be worth
adding a comment here describing what chain of events the Quit() sets
off:
  (1) uiLoop starts quitting
  (2) uiLoop exits Run() in XRE_InitChildProcess()
  (3) ChildProcess goes out of scope and terminates the IO thread
  (4) ChildProcess joins the IO thread
  (5) exit()
(I had to walk through this in a debugger to remind myself ;) ).


Again, the approach of this patch looks very good.  I'm extremely
impressed that you had the patience to slog through the "sins of your
fathers" with this code; it's old and crufty.  Thanks for moving us in
the right direction!

I made a lot of comments here, and I don't want to them to frustrate
you: I view this bug as an opportunity to clean up a lot of
long-standing crud.  I'd be more than happy to take your excellent
start and pick the nits I listed above.
I'm certainly not frustrated, I expected to get more feedback before landing this and you addressed the areas of the patch where I was not 100% sure off.

> >diff --git a/ipc/chromium/src/chrome/common/ipc_channel_win.cc b/ipc/chromium/src/chrome/common/ipc_channel_win.cc
> >--- a/ipc/chromium/src/chrome/common/ipc_channel_win.cc
> >+++ b/ipc/chromium/src/chrome/common/ipc_channel_win.cc
> >@@ -80,6 +80,9 @@
> >     output_queue_.pop();
> >     delete m;
> >   }
> >+
> >+  if (thread_check_.get())
> >+    thread_check_.reset();
> > }
> 
> What's this for?
> 
One of the big issues I ran into is that the ipc_channel_win does not work the same as ipc_channel_possix. The ipc channel is created/destroyed on the main thread but the channel is opened on the IO Thread. When the channel is opened on the IO thread the tread_check is set to the IO Thread. On shutdown the object cannot be released from the main thread because thread_check says it is still owned by the IO Thread. With this change the lifecycle of the channel is:
(1) ipc_channel is constructed on the main thread.
(2) ipc_channel is connected, and owned, by the IO Thread.
(3) ipc_channel is closed and no longer owned by the IO Thread. (my 1st diff)
(4) ipc_channel is released from the main thread. The de-constructor tries to close the channel again but it is already closed so that is my second diff.

The first thing I had tried was to have the ipc_channel created and owned by the IO Thread but I was not able to do this because the main thread also needs to the ipc_channel. The only way I could conceive doing it would require too much refactoring, but if you can see a better way then we could do without these changes.

> > 
> > bool Channel::ChannelImpl::Send(Message* message) {
> >diff --git a/ipc/chromium/src/chrome/common/ipc_channel_win.h b/ipc/chromium/src/chrome/common/ipc_channel_win.h
> >--- a/ipc/chromium/src/chrome/common/ipc_channel_win.h
> >+++ b/ipc/chromium/src/chrome/common/ipc_channel_win.h
> >@@ -20,7 +20,11 @@
> >  public:
> >   // Mirror methods of Channel, see ipc_channel.h for description.
> >   ChannelImpl(const std::wstring& channel_id, Mode mode, Listener* listener);
> >-  ~ChannelImpl() { Close(); }
> >+  ~ChannelImpl() { 
> >+    if (pipe_ != INVALID_HANDLE_VALUE) {
> >+      Close();
> >+    }
> >+  }
> 
> This too?
> 
See (4) above.
Clean on mochitest-ipcplugins+IPDL/C++ on linux, will look into windows in the vmorning.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I think something else need to be changed for e10s branch.... 
BenWa could you make additional patch for e10s branch?
Depends on: 560630
Attachment #444603 - Flags: review?(jones.chris.g) → review+
Blocks: 828959
You need to log in before you can comment on or make changes to this bug.