Closed Bug 544095 Opened 16 years ago Closed 15 years ago

Need strings / plan for potential "plugin hung" UI

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: Dolske, Assigned: Dolske)

References

Details

Attachments

(6 obsolete files)

Bug 538910 is implementing the "plugin crashed" UI. Bug 540004 will be adding support for detecting/killing plugins that have not _crashed_ but are hung / unresponsive. AIUI, bsmedberg would like to minimize branch string churn by landing the strings needed for this UI on Lorentz (3.6.x), so that it's easier to add the UI implementation later if we want to. So we'll need to know what we want this UI to do (eg modal popup vs. notification bar?) and say. Limi, can you make a proposal here? [I also made the edgy suggesting in bug 540004 comment 2 to just treat hangs the same as crashes, but having Plan B strings even if we do that would be a good idea.] Doing in-content UI is probably possible -- will we still have the last bitmap drawn by the plugin? We could overlay a blurred / semitransparent dark overlay, similar to what we do with <video> while it's buffering. [Will we reliably know that a plugin in this state is actually stuck and not just intensively rendering a game or something? We'd probably want to avoid interrupting the user if that's the case.]
Assignee: nobody → dolske
FWIW, there is no possibility for the hang detector to use a notification bar or blurs or anything fancy. It has to be an app-modal dialog, and the only options are "wait" and "kill the plugin". It may not even be possible to use XUL to present the dialog: we may have to use native UI (MessageBox).
Eww. Well, we can still plan for the possibility of such UI, but IMO that strengthens the argument to just kill hung plugins automatically without prompting. Do we know of any cases where such hangs are more-or-less expected/normal, and for what duration? (Loading Java, maybe?)
Well... I'm more worried about the hang detector going off in cases where it shouldn't, because of our programming or logic error, but perhaps that's an unrealistic fear. There are certainly cases where e.g. we're attaching to the plugin process in a debugger (or using the Flash debugging tools) where we want to be able to suppress the hang detector, but we could just do that with a pref. The other case where this could be a problem is where the plugin pops up a nested event loop (such as a modal dialog box of some sort). I believe we have plans to spin the event loop on the other side to avoid hangs, but we need to make sure that this cancels or modifies the hang detector behavior.
Just chatted with Karl, and it appears that we can play some games to detect nested glib event loops too. And to reiterate bsmedberg's point, we're going to need to be extremely careful with what we use for hang dialog; a XUL dialog, or anything else that might allow JS to run, makes me squeamish. Unfortunately, I think it *does* need to be an app-modal dialog, because any instance of a particular plugin, in any tab, might trip the hang detector. I'd prefer the detector to be off by default in DEBUG builds. One issue that comes to mind: will the hang detector interact badly with the slow-script detector? Let's say that script S calls into plugin P, and during the call into P, S exceeds the script timeout and the watchdog sets the JS interrupt. Let's say that plugin P also trips the hang detector, and we return back into JS land. Would it be possible to get a hang detector dialog immediately followed by a slow-script dialog? I would find that annoying. I don't have any strong preference for the hang UI itself, something like the slow-script dialog seems fine with me. I would caution against an "Always take this action" checkbox that could permanently disable the hang detector (slow-script doesn't have that either). Not sure how we could integrate an "always-kill" option, but I'd be fine with that. In any case where we kill the plugin because of a hang, I would like some indication; door hanger, different "sad plugin" icon, whatever is fine.
(In reply to comment #4) > I'd prefer the detector to be off by default in DEBUG builds. > Another option might be, in DEBUG builds, to provide an additional option [Continue waiting] [Stop] [Debug this hang] that kills the plugin process (thereby triggering the debugger there), and then DebugBreak()s in the browser process.
(In reply to comment #5) > (In reply to comment #4) > > I'd prefer the detector to be off by default in DEBUG builds. > > > > Another option might be, in DEBUG builds, to provide an additional option > > [Continue waiting] [Stop] [Debug this hang] > > that kills the plugin process (thereby triggering the debugger there), and then > DebugBreak()s in the browser process. Hmm. For the case where we already have the plugin process in a debugger, we'd probably want *another* button in DEBUG builds. [Continue waiting] [Stop] [Ignore] [Debug this hang] where "Continue waiting" resets the timeout a la slow-script, and "Ignore" sets the timeout to 0 (for the rest of the browsing session?).
Another question is whether we want killing the plugin to trigger a crash report for the plugin process. I think it should, if Ben and Jim think we could get useful enough information from it without a browser-side stack. This is pretty trivial on linux, but I understand it's trickier elsewhere. Ted, is it "easy enough" on windows?
Ideally I'd like the hang detector to collect stacks for both processes. I believe we have or could easily have a function to "create a minidump and continue execution".
Hmm, yeah, breakpad has such an interface but nsExceptionHandler doesn't expose it.
Issue with hang detection: what should happen if the plugin responds while the hang dialog is up? Unlike slow-script freezing JS, we can't lock up the plugin process. If the user selects "Continue waiting" there's no problem. I'm not sure what should happen if the user selects "Stop" though. Easiest would be to haul off and kill the plugin anyway. With a little fancy footwork, we could detect that the reply came in while the dialog was up, and just continue on as if nothing happened, but I wonder if that would be confusing for users ...
(In reply to comment #10) > Issue with hang detection: what should happen if the plugin responds while the > hang dialog is up? Probably just do what the use says; self-dismissing dialogs would be kind of confusing, and if you have a plugin that's stuck in a loop of "hang for 30 seconds, respond briefly", you probably want to kill it even if it's "responding".
(In reply to comment #9) > Hmm, yeah, breakpad has such an interface but nsExceptionHandler doesn't expose > it. You're out-of-process, it should be simple enough to use the same API you're currently using to just write a minidump for the child process, no?
For the windows hangs, it'd be most useful to grab a browser-process minidump, then kill the plugin process in such a way that it also generates a minidump. On linux at least, we can just call |gExceptionHandler->WriteMinidump()| within the browser process and it'll work like magic, although we'll need to shuffle around some callback code. Not so sure about windows.
I played around with a bare-bones Gtk2 hang dialog, and the results were not good. I couldn't find a (simple) way to use vanilla gtk APIs to block timer and IO events from the dialog's nested main event loop. Karl tells me that Gtk is pretty closely tied to a single, main glib event loop, so creating another event loop (even on another thread) doesn't look promising either. It chills me to the bone to think what would be required on Windows. So, I think for a release "soon", we have the following realistic options, in order of feasibility (1) Kill the plugin process the first time it hangs, then throw up a dialog asking whether that was the right thing to do. Respect that decision from there on. (2) Instead of popping up a ghetto dialog in the browser process, launch a subprocess to do it. This actually surprisingly easy. Jim/Ben, would this contribute to the deadlock-o-rama on Windows? (3) Finish remoting the prompt service. This is nice because then we could make a XUL dialog with localized strings. Benjamin, I thought I remembered hearing that this was underway; what's the current status? I couldn't find any open bugs.
(3) would be bug 516749.
Thanks. Drat, didn't look through the e10s-mobile bugs. PromptService doesn't quite look like what we'd need, so IMHO it's down to (1) or (2).
I think #2 is better, and is similar to what we already do for assertions (windbgdlg).
I tried to make this hacky version as much like the JS slow-script dialog as possible, to the point of copying and pasting its language. l10n should hopefully just be a matter of s/$script/$plugin/. UX issue worth discussing: in all the code dealing with this dialog, I made the failure mode be "kill plugin". This seems to be what the slow-script dialog does, and I think it makes sense. Right now the plugin library file path is displayed where the description should be, but I expect that after Justin's crash dialog lands we can just swap in the pretty name.
In the first version, a SIGSTOP'd subprocess would not have been cleaned up. Also inverted |if (parentProcess)| to |if (childProcess)| for readability.
Attachment #425599 - Attachment is obsolete: true
Attachment #425606 - Flags: review?(benjamin)
Attachment #425599 - Flags: review?(benjamin)
Comment on attachment 425596 [details] [diff] [review] Part 1: Plugin-timeout-dialog stubs and code shared across platforms. + aDialogMessage += NS_LITERAL_STRING(""); Ineffective line can be removed. Please put the declaration of AskUserToWaitFor in a header file (probably PluginMessageUtils.h) and use /** style doc comments */.
Attachment #425596 - Flags: review?(benjamin) → review+
Comment on attachment 425606 [details] [diff] [review] Part 3: Build and use the Gtk timeout dialog, v2. Don't recurse through timeout-dialog/Makefile, just do DIRS = timeout-dialog/gtk2
Attachment #425606 - Flags: review?(benjamin) → review+
Comment on attachment 425597 [details] [diff] [review] Part 2: Gtk plugin-timeout dialog hackery. >+ gtk_label_set_line_wrap_mode(label, PANGO_WRAP_WORD); >+ gtk_label_set_justify(label, GTK_JUSTIFY_LEFT); These are the defaults so no need to set these, though it doesn't hurt, so I'll leave it to you to decide. GTK_JUSTIFY_LEFT gets switched to PANGO_ALIGN_RIGHT in rtl mode, so hopefully GTK is doing the right thing for us. >+ GtkBox* outerBox = GTK_BOX(gtk_vbox_new( >+ FALSE, // non-homogenous spacing >+ 0)); // padding (pixels) >+ gtk_box_pack_start(outerBox, GTK_WIDGET(contentBox), >+ TRUE, // expand >+ TRUE, // don't fill, >+ 10); // padding I don't know why the outerBox would be needed as I assume this is a child of a vbox anyway, but I'm not familiar with these, so maybe it is needed for some reason? >+ GtkContainer* dialogArea = >+ GTK_CONTAINER(gtk_dialog_get_content_area(GTK_DIALOG(dialog))); gtk_dialog_get_content_area is only available in GTK+-2.14 and newer. Currently we need to support GTK+-2.10, and especially if this is going to be backported to 1.9.2 then raising the support bar is probably not an option. Example 4 in the older docs uses gtk_container_add (GTK_CONTAINER (GTK_DIALOG(dialog)->vbox), label); http://library.gnome.org/devel/gtk/2.10/GtkDialog.html so I think we should go with that. >+ case GTK_RESPONSE_ACCEPT: >+ continueWaiting = false; break; >+ >+ case GTK_RESPONSE_REJECT: >+ continueWaiting = true; break; >+ >+ default: >+ warning("No response from user, falling back on default"); >+ continueWaiting = false; >+ break; >+ } If the user closes the dialog with the window manager's close button, GTK_RESPONSE_DELETE_EVENT will fall through to the default. I would have thought/expected that closing the dialog meant a reject (or cancel), but I'm interested to hear other viewpoints. I think it's worth including an explicit GTK_RESPONSE_DELETE_EVENT case label because this is an intentional user action.
(In reply to comment #26) > If the user closes the dialog with the window manager's close button, > GTK_RESPONSE_DELETE_EVENT will fall through to the default. > > I would have thought/expected that closing the dialog meant a reject (or > cancel), but I'm interested to hear other viewpoints. > IMHO closing the dialog should stop the plugin. If one closes the JS slow-script dialog, the script is stopped. That makes sense to me.
Addressed comments. Note that in this version the user closing the dialog will result in the plugin being killed (as in the previous version).
Attachment #425597 - Attachment is obsolete: true
Attachment #425879 - Flags: review?(karlt)
Attachment #425597 - Flags: review?(karlt)
Comment on attachment 425879 [details] [diff] [review] Part 2: Gtk plugin-timeout dialog hackery, v2. If I offered a dialog to someone offering $100 and they just closed the dialog, I'd consider that a rejection, but maybe I'm just insecure that way ;). I can understand why making killing the plugin is appealing, though. It's just confusing if the response to a close is simply what we think is best. Maybe it depends on the wording of the dialog. Let's not block on this.
Attachment #425879 - Flags: review?(karlt) → review+
My opinion is that the point of the dialog is to ask "Are you really really sure you want the browser to remain useless while it waits on this (likely) misbehaving plugin?" So I would consider closing the dialog to indeed be a rejection, but the rejection is "no, I really don't want to wait any more."
Comment on attachment 425879 [details] [diff] [review] Part 2: Gtk plugin-timeout dialog hackery, v2. Requesting ui-r based on Justin's advice. I don't know what this entails. If it'd be better done after we have localizable strings, please let me know.
Attachment #425879 - Flags: ui-review?(faaborg)
Attachment #425879 - Flags: ui-review?(faaborg) → ui-review-
Comment on attachment 425879 [details] [diff] [review] Part 2: Gtk plugin-timeout dialog hackery, v2. Just had a conversation about this with dolske and limi, the three of us agree that the best interface from the user's perspective is for us to kill the plugin, and then use the in content UI to explain that the plugin crashed. Reasons include: 1) user's won't be able to make an informed decision on if they want to give the plugin more time or kill it, since they obviously don't know what is going on behind the scenes. 2) User's won't be able to correctly attribute in the dialog box who is speaking, who crashed, and what is wrong. Every plugin crash will become "Firefox crashed, I wish it was more stable" because they are using Firefox, and Firefox produced a dialog box that said "broken."
Additional context for the second point, there are a lot of internal Web pages inside [super technical company] that trigger Firefox's javascript is unresponsive dialog box. I've been repeatedly told that "*Firefox crashes* on a bunch of internal [super technical company] web sites" by employees of [super technical company]. I've never heard "who ever created these internal web sites at [super technical company] didn't do a very good job, and Firefox is just pointing that out." Part of being the higher level thing is that we generally pick up all the blame if we announce that something else is broken.
Comment on attachment 425879 [details] [diff] [review] Part 2: Gtk plugin-timeout dialog hackery, v2. Justin/Alex/Alexander, if we want hung plugins to be treated like crashes, is there any new UI required?
Attachment #425879 - Attachment is obsolete: true
I think we can reuse the existing UI, the difference between a "hang" and "crash" isn't going to be meaningful to most users. And it sounds like we want it to act the same way too.
yeah, no difference, hangs turn into crashes when we actively crash them instead of leaving them hanging.
Calling this WFM; we're using the existing crashed-plugin UI and not adding special native UI for hung plugins.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
(In reply to comment #33) > Part of being the higher level thing is that we > generally pick up all the blame if we announce that something else is broken. Of course (20-20 hindsight now, although I wasn't looking at this bug when it was going), killing the process does not save us from being blamed, if the plugin was not actually wedged. The slow-script dialog at least gives the user a way to continue (if her machine were temporarily bogging, this might be enough to be forgiven). No easy answers here, although the bigger timeout is easy enough to engineer. /be
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: