Closed
Bug 540004
Opened 14 years ago
Closed 14 years ago
Implement an IPDL hang detector
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
status1.9.2 | --- | .4-fixed |
People
(Reporter: benjamin, Assigned: cjones)
References
Details
(Whiteboard: [fixed-lorentz])
Attachments
(4 files, 1 obsolete file)
15.73 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
16.01 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
6.84 KB,
patch
|
Details | Diff | Splinter Review | |
7.30 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
If a child process hangs, we should be able to notice that and prompt the user to kill it after some timeout period. It seems that this would be fairly easy to do with sync or RPC style calls... probably less easy if everything is async, e.g. chrome->content. But for now, RPC for plugin processes should be sufficient. I'd like to get this hooked up to UI for Firefox Lorentz if possible. We're limited to putting up an app-modal prompt, something like: "The plugin 'Adobe Flash Player' is not responding after 60 seconds. Stopping the plugin allows you to use Firefox again." [Stop] [Wait] Limi, can you propose better strings?
Assignee | ||
Comment 1•14 years ago
|
||
The low-level impl of this is actually quite trivial, basically s/condvar.Wait()/condvar.Wait(kTimeout)/.
Comment 2•14 years ago
|
||
Edgy proposal: what if we just kill the non-responsive plugin and treat it as a crash? Most users would (arguably) consider a crash and hang to be basically the same thing, anyway.
Reporter | ||
Comment 3•14 years ago
|
||
That is edgy: we would at least need a pref to disable that behavior, for developers using the flash debugger. We'd also need to be really certain of ourselves.
Reporter | ||
Comment 4•14 years ago
|
||
Too edgy for content processes, certainly, which can hang loading tinderbox pages for far longer than a minute: but perhaps doable for plugins. I'd still like strings available for a dialog, just in case we need to revise that decision during beta/RC.
Assignee | ||
Comment 5•14 years ago
|
||
(In reply to comment #0) > "The plugin 'Adobe Flash Player' is not responding after 60 seconds. Stopping > the plugin allows you to use Firefox again." > > [Stop] [Wait] > Shouldn't we use basically the same language as the slow-script dialog? Also, while this dialog is popped up, are we going to completely block the main-thread event loop? As long as that's the case, the backend for this isn't too terribly hard.
Reporter | ||
Comment 6•14 years ago
|
||
With the slow-script dialog, we just terminate the currently running JS. With the hang detector, I think we have no choice but to terminate the plugin process. And yes, this should be a nested event loop which blocks the main event loop.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → jones.chris.g
OS: Linux → All
Hardware: x86 → All
Assignee | ||
Comment 7•14 years ago
|
||
Just looking for a rubber stamp r+.
Attachment #425272 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 8•14 years ago
|
||
Attachment #425275 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 9•14 years ago
|
||
Attachment #425277 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 10•14 years ago
|
||
Assignee | ||
Comment 11•14 years ago
|
||
Still need a windows implementation, although we can push everything but the unit test without it.
Assignee | ||
Comment 12•14 years ago
|
||
Blocking this on the new IPDL features that need to land on m-c along with this (to avoid serious merge headaches later).
Assignee | ||
Comment 13•14 years ago
|
||
After playing around with a Gtk dialog for bug 544095, I realized that this patch ends up calling |listener->OnTimeout()| (and thus |actor->ShouldContinueFromReplyTimeout()|) while still holding the *Channel mutex. (The modal event loop was woken up to deliver an event to the plugin :( .) Probably not desirable, but I'll think on it overnight to see if there's any benefit.
Assignee | ||
Comment 14•14 years ago
|
||
Now SyncChannel drops mMutex before calling out to OnTimeout.
Attachment #425277 -
Attachment is obsolete: true
Attachment #425865 -
Flags: review?(bent.mozilla)
Attachment #425277 -
Flags: review?(bent.mozilla)
Updated•14 years ago
|
Attachment #425272 -
Flags: review?(bent.mozilla) → review+
Comment on attachment 425275 [details] [diff] [review] Detect hangs while awaiting synchronous IPC replies. >- mChannelState = ChannelClosing; Doesn't seem like you ever use this again... Simple omission? Or can it be removed from the enum? >diff --git a/ipc/glue/Makefile.in b/ipc/glue/Makefile.in Unnecessary. >+const PRIntervalTime SyncChannel::kNoTimeout = PR_INTERVAL_NO_TIMEOUT; This seems unnecessary. >@@ -201,25 +215,71 @@ SyncChannel::OnChannelError() > if (AwaitingSyncReply()) > NotifyWorkerThread(); > } > > // > // Synchronization between worker and IO threads > // > >+namespace { >+ >+bool >+IsTimeoutExpired(PRIntervalTime aStart, PRIntervalTime aTimeout) >+{ >+ return (aTimeout != SyncChannel::kNoTimeout) && >+ (aTimeout <= (PR_IntervalNow() - aStart)); >+} >+ >+} // namespace <anon> >+ >+bool >+SyncChannel::ShouldContinueFromTimeout() Maybe it's just late or maybe I'm just tired but no matter how I look at this function I can't tell how the if test will ever succeed or how the function can ever return false.
Comment on attachment 425865 [details] [diff] [review] Add a ShouldContinue() interface to IPDL actors that allows to decide how a hang should be treated, v2 >+ void SetReplyTimeout(PRIntervalTime aTimeout) { >+ AssertWorkerThread(); >+ mTimeout = (aTimeout <= 0) ? kNoTimeout : aTimeout; PRIntervalTime is an unsigned int.
Attachment #425865 -
Flags: review?(bent.mozilla) → review+
Comment on attachment 425275 [details] [diff] [review] Detect hangs while awaiting synchronous IPC replies. Ok, this makes sense now with the followup patch.
Attachment #425275 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 18•14 years ago
|
||
Pushed http://hg.mozilla.org/projects/electrolysis/rev/d0e79b58e9b8 Pushed http://hg.mozilla.org/projects/electrolysis/rev/b85532205c1b Pushed http://hg.mozilla.org/projects/electrolysis/rev/6d5e409f594a This is going to temporarily break the windows build. On it. The IPDL/C++ test has been pushed back to bug 545053, because it can't land until we have a windows hang detector.
Whiteboard: [land m-c]
Assignee | ||
Comment 19•14 years ago
|
||
Just to note again that we need to land bug 540111 and bug 540886 before this.
Assignee | ||
Comment 20•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/81ccf85816bc http://hg.mozilla.org/mozilla-central/rev/e70a61a00dad http://hg.mozilla.org/mozilla-central/rev/1fd68671241e http://hg.mozilla.org/mozilla-central/rev/cba43f45ed9e
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 21•14 years ago
|
||
This is contained entirely in ipc/*, fixed in Lorentz by virtue of importing that code directly.
Whiteboard: [land m-c] → [fixed-lorentz]
Comment 22•14 years ago
|
||
Blanket approval for Lorentz merge to mozilla-1.9.2 a=beltzner for 1.9.2.4 - please make sure to mark status1.9.2:.4-fixed
Reporter | ||
Comment 23•14 years ago
|
||
Merged into 1.9.2 at http://hg.mozilla.org/releases/mozilla-1.9.2/rev/84ba4d805430
status1.9.2:
--- → .4-fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•