Open Bug 961035 Opened 11 years ago Updated 2 years ago

Pass a reasonable timeout to EnsureProcessTerminated

Categories

(Core :: IPC, defect, P3)

x86_64
macOS
defect

Tracking

()

People

(Reporter: RyanVM, Unassigned)

References

Details

While investigating OSX debug mochitest-4 shutdown hangs in bug 960605, Benjamin made some observations about suspicious IPC behavior. (In reply to Benjamin Smedberg [:bsmedberg] from comment #1) > 08:55:16 INFO - 1 XUL!(anonymous > namespace)::ChildLaxReaper::WillDestroyCurrentMessageLoop() > [process_watcher_posix_sigchld.cc : 68 + 0xb] > > This function by design waits *forever* for child processes to exit. > > So there are a couple questions here: > * why aren't we passing force=true to > ProcessWatcher::EnsureProcessTerminated? that would cause us to wait only > two seconds and then kill misbehaving children, instead of waiting forever > * what is the child process doing? > > Debugging information that might help includes: > * the PID of the process we're waiting for > * the log of when we launch child processes (content and/or plugin) > * the state/stack trace of the child process that we're waiting on The log in question: https://tbpl.mozilla.org/php/getParsedLog.php?id=33109253&tree=Fx-Team
I can't tell if there's anything actionable here or not.
Flags: needinfo?(jmathies)
This is probably NOTABUG, but I'm wondering if there's a reason why there's no timeout in this case (which helped give us some fun times in bug 933680) instead of just a much-longer-than-normal one.
Yeah we added this infinite wait for what appears to be log output from automation test runs? We could set this to something more sane, maybe 15 seconds or 30 seconds? It doesn't look like this is going to cause problems for users though, from looking at the code. 128 ProcessWatcher::EnsureProcessTerminated(mChildProcessHandle 129 #ifdef NS_FREE_PERMANENT_DATA 130 // If we're doing leak logging, shutdown can be slow. 131 , false // don't "force" 132 #endif wrapped around defined(NS_BUILD_REFCNT_LOGGING) || defined(MOZ_VALGRIND) || defined(MOZ_ASAN): http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nscore.h#238 default is true for normal builds: http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/process_watcher.h#26
Flags: needinfo?(jmathies)
Flags: needinfo?(ryanvm)
Adding a timeout of like a minute sounds reasonable to me, though note that we'll get an orange anyways in debug builds because we won't have a leak log. LSan will just silently stop reporting leaks which is not good. I'm not sure how long it takes for it to run.
I think coming up with a sane default is something the Platform folks should decide on.
Flags: needinfo?(ryanvm)
Priority: -- → P3
Summary: Somebody is calling EnsureProcessTerminated incorrectly → Pass a reasonable timeout to EnsureProcessTerminated

Is this still relevant?

Flags: needinfo?(jld)

(In reply to [Away until April 25] Ryan VanderMeulen from comment #7)

Is this still relevant?

It is still relevant, and we have a few other bugs about this and related issues. What mccr8 said in comment #5 is basically my understanding: if we make IPC time out here, then we'll break the test run anyway. I think what we actually want are crash reports for those processes, but this is currently very late in shutdown so that might not work.

There are some comments about IPC shutdown cleanup in bug 1724337, and that seems to be the “see also” hub for this family of problems.

Flags: needinfo?(jld)
See Also: → 1724337
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.