Closed Bug 535077 Opened 15 years ago Closed 14 years ago

Failing to exec() mozilla-runtime (plugin-runtime) for a sync launch causes Gecko to hang

Categories

(Core :: IPC, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+
blocking1.9.2 --- -

People

(Reporter: cjones, Assigned: benedict)

References

Details

(Keywords: hang)

Attachments

(1 file, 5 obsolete files)

This is because we wait on a monitor being notified, and it never happens.  Quick fix is to time out after k seconds, better fix is to asynchronously launch all our subprocesses.  AFAIK we already have bugs for the latter.

I'm not very happy with this patch; it's just a stopgap.
Attachment #417869 - Flags: review?(bent.mozilla)
Note: bug 535090 describes one reason that exec() can fail.
See Also: → 535090
Version: unspecified → Trunk
Comment on attachment 417869 [details] [diff] [review]
time out if process isn't ready after 10 seconds

Sigh.
Attachment #417869 - Flags: review?(bent.mozilla) → review+
I've been debating whether or not to push this for the last few days.  I've decided not to; it would just add too much suck.  If we don't get async plugin launch done before this change becomes necessary, I think instead we could do something like the JS slow-script timeout.  We'll want that anyway for generic NPAPI operations.
Depends on: 535090
See Also: 535090
No longer depends on: 535090
Blocks: OOPP
Bug 558146 is on Windows so if it is a duplicate of this bug,
the Platform of this bug shouldn't be set to Linux, but All.

Besides if I didn't miss something this bug is about fixing the hang,
but there should also be some notification to the user why mozilla-runtime couldn't be run and what to do to fix the problem.
Summary: Failing to exec() mozilla-runtime for a sync launch causes Gecko to hang → Failing to exec() mozilla-runtime (plugin-runtime) for a sync launch causes Gecko to hang
Probably don't need to get this into 3.6.6, but sounds worth doing for the cases where virus scanners etc block plugin-runtime, or silly users remove it, or the other cases we've been discovering.
Assignee: nobody → bhsieh
blocking1.9.2: --- → ?
I can speak somehow speak for the silly or whatever classified users, having a full .exe process running you can just flip with taskmanager in mswin it's no surprise that users do it - to add some feedback from a users perspective.

As message I suggest to just inform the user that "There was a problem to start the %s-Plugin." - and that's it. If it crashes there are no specifics as well.

Could be made in the same design like the message you get as if a plugin (more precise: the plugin-container.exe) was removed from memory for whatever reason.
Not going to "block" a branch release on it but would consider approving a patch if you've solved cjones concern that "it would just add too much suck" (comment 3).
blocking1.9.2: ? → -
To be clear, what we should implement here is a timeout the same way we timeout RPC calls in general, so that by default it will time out after 10, or 45, or whatever seconds.
Uses the dom.ipc.plugins.timeSecs pref.
Attachment #459981 - Flags: review?(jones.chris.g)
Comment on attachment 459981 [details] [diff] [review]
time out if process isn't ready after X (pref or default of 10) seconds, plus test

One large-ish change: we don't want to conflate the reply timeout with
process launching.  They're very separate beasts, and ironically we'll
probably eventually want the process launch timeout to be shorter than
the reply timeout, since all the code in that path is under our
control (dllmains excepted).  Something like
dom.ipc.plugins.processLaunchTimeoutSecs with the same semantics and
values as the reply timeout (45 secs for release builds, 0=infinity
for debug) should be OK for now.  We can fiddle with the value later.

>diff --git a/dom/plugins/PluginModuleParent.cpp b/dom/plugins/PluginModuleParent.cpp
>--- a/dom/plugins/PluginModuleParent.cpp
>+++ b/dom/plugins/PluginModuleParent.cpp
>@@ -77,23 +77,33 @@
> };
> 
> // static
> PluginLibrary*
> PluginModuleParent::LoadModule(const char* aFilePath)
> {
>     PLUGIN_LOG_DEBUG_FUNCTION;
> 
>+    PRInt32 prefSecs = nsContentUtils::GetIntPref(kTimeoutPref, 0);
>+
>     // Block on the child process being launched and initialized.
>     PluginModuleParent* parent = new PluginModuleParent(aFilePath);

Since this function is now branchy, please make this an
|nsAutoPtr<PluginModuleParent>| for which we |return parent.forget();|
in the success case below.  Really should have been an auto pointer
from the start, that's my fault, sorry.

>-    parent->mSubprocess->Launch();
>+    bool retval = parent->mSubprocess->Launch(prefSecs * 1000);
>+    if (!retval) {
>+        // XXX Is this okay?
>+        parent->mShutdown = true;
>+        delete parent;
>+        return nsnull;
>+    }
>     parent->Open(parent->mSubprocess->GetChannel(),
>                  parent->mSubprocess->GetChildProcessHandle());
> 
>-    TimeoutChanged(kTimeoutPref, parent);
>+    int32 syncTimeoutMs = (prefSecs > 0) ? (1000 * prefSecs) :
>+                          SyncChannel::kNoTimeout;
>+    parent->SetReplyTimeoutMs(syncTimeoutMs);
> 

I don't like the copy and paste of this code, but with a separate
process-launch timeout this change will go away.

>diff --git a/dom/plugins/PluginProcessParent.h b/dom/plugins/PluginProcessParent.h
>--- a/dom/plugins/PluginProcessParent.h
>+++ b/dom/plugins/PluginProcessParent.h
>@@ -58,17 +58,17 @@
> {
> public:
>     PluginProcessParent(const std::string& aPluginFilePath);
>     ~PluginProcessParent();
> 
>     /**
>      * Synchronously launch the plugin process.
>      */
>-    bool Launch();
>+    bool Launch(PRInt32 timeoutSecs);
> 

Er, |timeoutMs|?  Please describe the param in the comment here.

>diff --git a/ipc/glue/GeckoChildProcessHost.cpp b/ipc/glue/GeckoChildProcessHost.cpp
>--- a/ipc/glue/GeckoChildProcessHost.cpp
>+++ b/ipc/glue/GeckoChildProcessHost.cpp
>@@ -62,16 +62,18 @@
> #ifdef XP_WIN
> #include "nsIWinTaskbar.h"
> #define NS_TASKBAR_CONTRACTID "@mozilla.org/windows-taskbar;1"
> #endif
> 
> using mozilla::MonitorAutoEnter;
> using mozilla::ipc::GeckoChildProcessHost;
> 
>+static const int32 kExecTimeoutMs = 10000;
>+

We should default to infinite timeout, see below.  This change can go
away.

> template<>
> struct RunnableMethodTraits<GeckoChildProcessHost>
> {
>     static void RetainCallee(GeckoChildProcessHost* obj) { }
>     static void ReleaseCallee(GeckoChildProcessHost* obj) { }
> };
> 
> GeckoChildProcessHost::GeckoChildProcessHost(GeckoProcessType aProcessType,
>@@ -126,38 +128,41 @@
>     } else {
>       mGroupId.AssignLiteral("-");
>     }
>   }
> }
> #endif
> 
> bool
>-GeckoChildProcessHost::SyncLaunch(std::vector<std::string> aExtraOpts)
>+GeckoChildProcessHost::SyncLaunch(std::vector<std::string> aExtraOpts, int aTimeoutMs)
> {
> #ifdef XP_WIN
>   InitWindowsGroupID();
> #endif
> 
>+  int timeoutMs = (aTimeoutMs > 0) ? aTimeoutMs : kExecTimeoutMs;

Please make this 

  PRIntervalTime timeoutTicks = (aTimeoutMs > 0) ? PR_MillisecondsToInterval(aTimeoutMs) : PR_INTERVAL_NO_TIMEOUT;

>   MessageLoop* ioLoop = XRE_GetIOMessageLoop();
>   NS_ASSERTION(MessageLoop::current() != ioLoop, "sync launch from the IO thread NYI");
> 
>   ioLoop->PostTask(FROM_HERE,
>                    NewRunnableMethod(this,
>                                      &GeckoChildProcessHost::PerformAsyncLaunch,
>                                      aExtraOpts));
>-
>   // NB: this uses a different mechanism than the chromium parent
>   // class.
>   MonitorAutoEnter mon(mMonitor);
>-  while (!mLaunched) {
>+  while (!mChannelInitialized) {

Um, no.  This change makes SyncLaunch() have the same semantics as
AsyncLaunch().  Instead, you want "has this timed out?" logic like
here:
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/SyncChannel.cpp#244 .

>     mon.Wait();
>   }
>-
>-  return true;
>+  if (!mLaunched) {
>+    printf ("waiting %i ms\n, timeoutMs\n", timeoutMs);

Kill this printf, or make it |if (LoggingEnabled()) ...|.

>diff --git a/ipc/ipdl/test/cxx/PTestSyncHang.ipdl b/ipc/ipdl/test/cxx/PTestSyncHang.ipdl

Holding off on reviewing the test until v2.

This is r- for the change in SyncLaunch() semantics.
Attachment #459981 - Flags: review?(jones.chris.g) → review-
Attached patch fixes review comments (obsolete) — Splinter Review
Passes on tryserver, by the way!
Attachment #417869 - Attachment is obsolete: true
Attachment #459981 - Attachment is obsolete: true
Attachment #460787 - Flags: review?(jones.chris.g)
Comment on attachment 460787 [details] [diff] [review]
fixes review comments

>diff --git a/dom/plugins/PluginModuleParent.cpp b/dom/plugins/PluginModuleParent.cpp
>--- a/dom/plugins/PluginModuleParent.cpp
>+++ b/dom/plugins/PluginModuleParent.cpp
> // static
> PluginLibrary*
> PluginModuleParent::LoadModule(const char* aFilePath)
> {
>     PLUGIN_LOG_DEBUG_FUNCTION;
> 
>+    PRInt32 prefSecs = nsContentUtils::GetIntPref(kLaunchTimeoutPref, 0);
>+
>     // Block on the child process being launched and initialized.
>-    PluginModuleParent* parent = new PluginModuleParent(aFilePath);
>-    parent->mSubprocess->Launch();
>+    nsAutoPtr<PluginModuleParent> parent(new PluginModuleParent(aFilePath));
>+    bool retval = parent->mSubprocess->Launch(prefSecs * 1000);

s/retval/launched/?

>diff --git a/ipc/ipdl/test/cxx/TestSyncHang.cpp b/ipc/ipdl/test/cxx/TestSyncHang.cpp
>new file mode 100644
>--- /dev/null
>+++ b/ipc/ipdl/test/cxx/TestSyncHang.cpp
>+void
>+TestSyncHangParent::Main()
>+{
>+  vector<string> args;
>+  args.push_back("fake/path");
>+  gSyncHangSubprocess = new mozilla::ipc::GeckoChildProcessHost(GeckoProcessType_Plugin);
>+  bool retval = gSyncHangSubprocess->SyncLaunch(args, 2);
>+  if (retval)
>+    fail("Calling SyncLaunch with an invalid path should return false");
>+  // XXX Is there a race with the IPDLUnitTest shutdown here?

Nope.

>diff --git a/modules/libpref/src/init/all.js b/modules/libpref/src/init/all.js
>--- a/modules/libpref/src/init/all.js
>+++ b/modules/libpref/src/init/all.js
>@@ -1253,23 +1253,27 @@
> //pref("editor.encode_entity",                 "html");
> 
> pref("editor.resizing.preserve_ratio",       true);
> pref("editor.positioning.offset",            0);
> 
> pref("dom.max_chrome_script_run_time", 20);
> pref("dom.max_script_run_time", 10);
> 
>+#ifndef DEBUG
> // How long a plugin is allowed to process a synchronous IPC message
> // before we consider it "hung".
>-#ifndef DEBUG
> pref("dom.ipc.plugins.timeoutSecs", 45);
>+// How long a plugin launch is allowed to take before
>+// we consider it failed.
>+pref("dom.ipc.plugins.processLaunchTimeoutSecs", 45);
> #else
> // No timeout in DEBUG builds
> pref("dom.ipc.plugins.timeoutSecs", 0);
>+pref("dom.ipc.plugins.processLaunchTimeoutSecs", 0);
> #endif

Looks good, thanks.  Might need something like sr for the new pref, pinging bsmedberg.
Attachment #460787 - Flags: superreview?(benjamin)
Attachment #460787 - Flags: review?(jones.chris.g)
Attachment #460787 - Flags: review+
I'm not sure why we need the new pref, could we just reuse dom.ipc.plugins.timeoutSecs? But really, not a big deal either way.
Attached patch fixes nits (obsolete) — Splinter Review
From bsmedberg's comments in IRC, I think this is ready to go.
Attachment #460787 - Attachment is obsolete: true
Attachment #460787 - Flags: superreview?(benjamin)
Keywords: checkin-needed
This can't be checked in without approval2.0+ or blocking2.0+, right?  This shouldn't be marked checkin-needed until it's got one of those & can actually be checked in.  (/me removes checkin-needed keyword; feel free to re-add if I'm missing something)
Keywords: checkin-needed
blocking2.0: --- → betaN+
Keywords: checkin-needed
Attached patch hg exported (obsolete) — Splinter Review
fixing up the header
Attachment #461731 - Attachment is obsolete: true
Attached patch hg exportedSplinter Review
Attachment #464567 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/89ef5b7a1f82
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: