Closed Bug 1118618 Opened 5 years ago Closed 5 years ago

[e10s] Slow script/plugin hang UI

Categories

(Core :: DOM: Content Processes, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
e10s m4+ ---

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(3 files, 1 obsolete file)

Here's the plan for this bug:

1. If a web page has a slow script, we tell the chrome process and it puts up a UI describing the problem and giving the user the option to terminate the script, debug it, kill the content process, or ignore the slow script.

2. If a plugin hangs, we tell the chrome process and it puts up a UI describing the problem and giving the user the option to kill the plugin, kill the content process, or ignore the hang.

3. If there chrome process is waiting for a CPOW when we want to display the hang UI, then we time out the CPOW.

I'll file a separate bug to figure out the best UX for this. Right now I have a simple UI that I think we should start with.
Attached patch platform-changesSplinter Review
There's a comment in ProcessHangMonitor.cpp that describes the architecture. I hope it's sufficient.
Attachment #8545028 - Flags: review?(mrbkap)
Attached patch frontend-changes (obsolete) — Splinter Review
This is pretty straightforward. There's a comment in nsIHangReport.idl in the previous patch that probably would be helpful to read.
Attachment #8545029 - Flags: review?(mconley)
Almost forgot this patch. It allows us to find out when a synchronous IPC starts and ends.
Attachment #8545403 - Flags: review?(bent.mozilla)
Duplicate of this bug: 1034213
Comment on attachment 8545029 [details] [diff] [review]
frontend-changes

Review of attachment 8545029 [details] [diff] [review]:
-----------------------------------------------------------------

This is really exciting stuff, Bill! Very, very cool.

Just a few notes. Thanks!

::: browser/base/content/browser.js
@@ +1192,5 @@
>      Services.obs.addObserver(gXPInstallObserver, "addon-install-complete", false);
>      window.messageManager.addMessageListener("Browser:URIFixup", gKeywordURIFixup);
>      window.messageManager.addMessageListener("Browser:LoadURI", RedirectLoad);
>  
> +    ProcessHangMonitor.init();

It looks like ProcessHangMonitor is only supposed to be initted once globally. Initializing it should probably be moved to nsBrowserGlue.js, since that's where we init global things instead of per-window things.

::: browser/base/content/tabbrowser.xml
@@ +1489,5 @@
>  
>              if (wasActive)
>                aBrowser.focus();
>  
> +            var evt = document.createEvent("Events");

let, not var

::: browser/locales/en-US/chrome/browser/browser.dtd
@@ +825,5 @@
>  <!ENTITY panicButton.thankyou.msg1                "Your recent history is cleared.">
>  <!ENTITY panicButton.thankyou.msg2                "Safe browsing!">
>  <!ENTITY panicButton.thankyou.buttonlabel         "Thanks!">
>  
> +<!ENTITY processHang.terminateScript.label        "Terminate JavaScript">

We might want to run some of these strings by UX or Matej Novak to make sure we've got the terminology right for our users. There's some pretty scary technical language in here, and the audience isn't going to just be developers.

::: browser/locales/en-US/chrome/browser/browser.properties
@@ +435,5 @@
>  dataReportingNotification.button.label  = Choose What I Share
>  dataReportingNotification.button.accessKey  = C
>  
> +# Process hang reporter
> +processHang.message       = A web page is causing %1$S to run slowly. What would you like to do?

Nit - I don't think the spaces in front of = add much, unless you want to align it with the other items. I wouldn't worry about aligning though.

::: browser/modules/ProcessHangMonitor.jsm
@@ +10,5 @@
> +let Cu = Components.utils;
> +
> +this.EXPORTED_SYMBOLS = ["ProcessHangMonitor"];
> +
> +Cu.import("resource://gre/modules/Services.jsm");

Might as well have this loaded lazily.

@@ +13,5 @@
> +
> +Cu.import("resource://gre/modules/Services.jsm");
> +
> +/**
> + * This JSM is responsible for observing content process hang reports

This is great, but in general I think each method of this module needs to be documented individually.

@@ +29,5 @@
> +  // List of hang reports that haven't expired or been dismissed by
> +  // the user. The list contains pairs [report, timer]. Each time the
> +  // hang is reported, the timer is refreshed so it expires after
> +  // HANG_EXPIRATION_TIME.
> +  _activeReports: [],

Instead of an Array, this is probably a great place to use a Map that keys on reports to timers.

@@ +90,5 @@
> +      callback: null,
> +    }];
> +
> +    nb.appendNotification(message, "process-hang",
> +                          "chrome://browser/content/aboutRobots-icon.png",

This is probably fine for now, but we should file a follow-up bug to make sure UX gets a chance to supply something better if they have something.

@@ +109,5 @@
> +    let frameLoader = browser.QueryInterface(Ci.nsIFrameLoaderOwner).frameLoader;
> +    let reportToShow = null;
> +    for (let [report, timer] of this._activeReports) {
> +      if (report.isReportForBrowser(frameLoader)) {
> +        reportToShow = report;

Might as well just return report here, and return null at the end of the function? Not sure we need reportToShow.

@@ +132,5 @@
> +    let callback = {
> +      handleResult: (pref) => {
> +        ignore = pref.value;
> +      },
> +      handleCompletion: () => {

I guess we don't care about the reason passed to this function?

@@ +159,5 @@
> +
> +    // If a new tab is selected or if a tab changes remoteness, then
> +    // we may need to show or hide a hang notification.
> +
> +    if (event.type == "TabSelect") {

Make this an or?

@@ +172,5 @@
> +    win.gBrowser.tabContainer.addEventListener("TabRemotenessChange", this, true);
> +  },
> +
> +  untrackWindow: function(win) {
> +    win.gBrowser.tabContainer.removeEventListener("TabSelect", this, true);

We need to untrack a window when it closes, otherwise I think we're going to leak. Where do we monitor for window closing so we can untrack?

@@ +176,5 @@
> +    win.gBrowser.tabContainer.removeEventListener("TabSelect", this, true);
> +    win.gBrowser.tabContainer.removeEventListener("TabRemotenessChange", this, true);
> +  },
> +
> +  updateWindows: function() {

Nit - can you please put this closer to updateWindow?

@@ +302,5 @@
> +        }
> +        let uri = browser.currentURI.spec;
> +        cps.set(uri, "browser.hangs.ignore.slow_script", true, browser.loadContext);
> +      } else {
> +        cps.setGlobal("browser.hangs.ignore.plugin", true, null);

So we're going to ignore plugin hangs for all domains if the user chooses to ignore for a single domain?

Or are you thinking along the lines of, "we got a Flash hang, ignore all Flash hangs from here on out"?

If so, I'm worried that we're mixing mental models for the user here. Some choices are going to be applied to domains, and some are going to be applied globally. That feels confusing.
Attachment #8545029 - Flags: review?(mconley) → review-
New patch.
Attachment #8545029 - Attachment is obsolete: true
Attachment #8546108 - Flags: review?(mconley)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #5)
> Comment on attachment 8545029 [details] [diff] [review]
> frontend-changes
> 
> Review of attachment 8545029 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> We might want to run some of these strings by UX or Matej Novak to make sure
> we've got the terminology right for our users. There's some pretty scary
> technical language in here, and the audience isn't going to just be
> developers.

I filed bug 1119442 for UX. I also made the terminology a little simpler.

> ::: browser/modules/ProcessHangMonitor.jsm
> @@ +10,5 @@
> > +let Cu = Components.utils;
> > +
> > +this.EXPORTED_SYMBOLS = ["ProcessHangMonitor"];
> > +
> > +Cu.import("resource://gre/modules/Services.jsm");
> 
> Might as well have this loaded lazily.

I don't think it makes sense to load Services lazily. Pretty much every module uses it, so it's already going to be loaded. The only savings is the cost of loading an already loaded module, which should be cheap.

> @@ +172,5 @@
> > +    win.gBrowser.tabContainer.addEventListener("TabRemotenessChange", this, true);
> > +  },
> > +
> > +  untrackWindow: function(win) {
> > +    win.gBrowser.tabContainer.removeEventListener("TabSelect", this, true);
> 
> We need to untrack a window when it closes, otherwise I think we're going to
> leak. Where do we monitor for window closing so we can untrack?

When a window is closed, Gecko automatically disconnects all its event listeners. So there isn't any need for this.

> @@ +302,5 @@
> > +        }
> > +        let uri = browser.currentURI.spec;
> > +        cps.set(uri, "browser.hangs.ignore.slow_script", true, browser.loadContext);
> > +      } else {
> > +        cps.setGlobal("browser.hangs.ignore.plugin", true, null);
> 
> So we're going to ignore plugin hangs for all domains if the user chooses to
> ignore for a single domain?
> 
> Or are you thinking along the lines of, "we got a Flash hang, ignore all
> Flash hangs from here on out"?
> 
> If so, I'm worried that we're mixing mental models for the user here. Some
> choices are going to be applied to domains, and some are going to be applied
> globally. That feels confusing.

Yeah, the ignore thing kinda bothered me. I really didn't like how you can't stop ignoring hangs. The content pref service unfortunately doesn't have any kind of generic UI, and I don't know how we would expose that otherwise.

I don't think ignoring hangs is all that useful. They should be rare and you can always just close the box. So I removed all the ignore machinery.
Comment on attachment 8546108 [details] [diff] [review]
frontend-changes v2

Review of attachment 8546108 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is fine, once the last set of things I found get fixed.

::: browser/base/content/browser.js
@@ +148,5 @@
>  
>  XPCOMUtils.defineLazyModuleGetter(this, "PageThumbs",
>    "resource://gre/modules/PageThumbs.jsm");
>  
> +// Needed by browser.xul.

TBH, I don't think this comment adds much value.

::: browser/modules/ProcessHangMonitor.jsm
@@ +244,5 @@
> +   * Install event handlers on |win| to watch for events that would
> +   * cause a different hang report to be displayed.
> +   */
> +  trackWindow: function(win) {
> +    //win.gBrowser.tabContainer.addEventListener("TabSelect", this, true);

Commented out code... this should probably be put back in? Or, the untrackWindow function needs to get the removeEventListener taken out.

@@ +276,5 @@
> +      timer.cancel();
> +      timer.initWithCallback(this, HANG_EXPIRATION_TIME, timer.TYPE_ONE_SHOT);
> +    }
> +
> +    // Otherwise create a new timer and display the report.

This doesn't look right - we're going to fall through and create a new timer and map it to this report even if we already had one. Was this supposed to be in an else?
Attachment #8546108 - Flags: review?(mconley) → review+
Comment on attachment 8545403 [details] [diff] [review]
sync-notification

Review of attachment 8545403 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8545403 - Flags: review?(bent.mozilla) → review+
Comment on attachment 8545028 [details] [diff] [review]
platform-changes

Review of attachment 8545028 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good. A couple of nits.

::: dom/base/nsGlobalWindow.cpp
@@ +10965,5 @@
> +                                       filename.get(),
> +                                       lineno);
> +    if (action == ProcessHangMonitor::Terminate) {
> +      return KillSlowScript;
> +    } else if (action == ProcessHangMonitor::StartDebugger) {

Nit: here and below: nix the else-after-returns.

::: dom/ipc/ProcessHangMonitor.cpp
@@ +190,5 @@
> +
> +template<>
> +struct RunnableMethodTraits<HangMonitorChild>
> +{
> +    typedef HangMonitorChild Class;

Nit: these traits structs should use two-space indentation.

@@ +387,5 @@
> +}
> +
> +void
> +HangMonitorChild::ClearHang()
> +{

Can we assert IsMainThread here?
Attachment #8545028 - Flags: review?(mrbkap) → review+
Had to back this out because it seems to be slowing down b2g ICS debug xpcshell tests to the point where they sometimes time out. A bit strange since it's disabled on b2g, but maybe it's real. I'll investigate.

https://hg.mozilla.org/integration/mozilla-inbound/rev/13c3bf8c3e21
https://hg.mozilla.org/mozilla-central/rev/1810fd0f78b4
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Relanded in one changeset (I was lazy).
https://hg.mozilla.org/integration/mozilla-inbound/rev/63680efe6d55

I had to set the max script run time to 0 on b2g. Otherwise I guess we trigger the interrupt callback too much and it's too slow.
https://hg.mozilla.org/mozilla-central/rev/63680efe6d55
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Duplicate of this bug: 588293
¡Hola Bill!

Is there some threshold to be tweaked?

I do see the new UI quite a few time on Nightly's start-up and I'm pretty sure I do see this way more often than the old modal window.

¡Gracias!
Flags: needinfo?(wmccloskey)
(In reply to alex_mayorga from comment #22)
> I do see the new UI quite a few time on Nightly's start-up and I'm pretty
> sure I do see this way more often than the old modal window.

If you're seeing the UI after waking up your laptop, then that's bug 1114345.

However, we also decreased the threshold for when it appears. The old slow script dialog appeared after 10 seconds. This one appears after 3 seconds. That number may require some tweaking. However, unlike the old dialog, the script continues to run while the pop-up is shown, so it doesn't really interrupt what the user is doing so much.

What was happening on your computer when the pop-up appeared? Three seconds is still a long time for a script to run. Does your computer swap a lot? Were there programs running in the background?
Flags: needinfo?(wmccloskey) → needinfo?(alex_mayorga)
¡Hola Bill!

As it is when I try to move the pointer to the button the notification would be gone by the time I reach it so it is rather confusing. Perhaps the notification bar should stick around until dismissed?

I see it pop up a lot during a session restore of not to big sessions (i.e. a couple windows and less than 20 tabs total). My feature request is for it not to pop-up that many times when doing a session restore.

The physical memory is at 70% tops so is not really swapping.

Did you try this with e10s?
Flags: needinfo?(alex_mayorga) → needinfo?(wmccloskey)
Hopefully bug 1123956 will fix some of these issues.

The popup is supposed to stay around for 5 seconds after the hang ends. I've increased that to 10 seconds. If that's not enough, I guess we can keep it around until the user closes it.

> Did you try this with e10s?

Yes. This bug is specific to e10s.
Flags: needinfo?(wmccloskey)
Duplicate of this bug: 1096664
Duplicate of this bug: 1059494
Duplicate of this bug: 1124063
You need to log in before you can comment on or make changes to this bug.