Closed Bug 1119442 Opened 5 years ago Closed 4 years ago

[e10s][ux] UX for the slow script/plugin hang notification

Categories

(Firefox :: General, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 45
Tracking Status
e10s m8+ ---
firefox45 --- fixed

People

(Reporter: billm, Assigned: jimm)

References

(Depends on 1 open bug)

Details

(Keywords: uiwanted, Whiteboard: [ux-qx][ux])

Attachments

(6 files, 10 obsolete files)

11.55 KB, image/png
Details
16.69 KB, image/png
Details
15.15 KB, image/png
Details
27.37 KB, patch
mconley
: review+
Details | Diff | Splinter Review
6.56 KB, patch
Details | Diff | Splinter Review
5.16 KB, patch
Details | Diff | Splinter Review
This code is being added in bug 1118618. It replaces the slow script dialog and the plugin hang dialog. There are a lot of options that users can take and we need to describe them in ways that are easy to understand.

Besides what's in the bug right now, it's also possible to display:
- the filename and line number of the script that is slow
- the name of the plugin that is hung

I left that out in the first version to avoid cluttering things up, but we might want to include that information.
tracking-e10s: --- → +
Flags: needinfo?(madhava)
Flags: firefox-backlog?
Flags: needinfo?(philipp)
Flags: needinfo?(madhava)
Flags: firefox-backlog?
Flags: firefox-backlog+
Gavin/Brad: Were you needinfo'ing me for specific feedback, or just to put this bug on my radar?
FWIW, I've added this to our list of UX bugs.
Flags: needinfo?(philipp) → needinfo?(blassey.bugs)
Summary: [e10s] UX for the slow script/plugin hang notification → [e10s][ux] UX for the slow script/plugin hang notification
Whiteboard: [ux-qx][ux]
To get it on your radar. Bill is going to land something temporary in bug 1118618
Flags: needinfo?(blassey.bugs)
Attached image Screenshot of orphaned menu (obsolete) —
With the current UI as an infobar, if the infobar goes away while the menu is open (in the case that the webpage's slow script finishes executing), then the menu will stay open.

I'm not sure what would happen if I chose "stop script" or "debug script" here.

We could probably close the menupopup at the same time that the infobar is going away.
For me I don't even get a chance to get to any menu, it vanishes before I can react. It leaves me lost and confused.
It might also be a good idea to show something when the script has completed. We could even do something more whimsy like "Kicking the tires..." changing to "... And we're back!"
Duplicate of this bug: 1124063
Duplicate of this bug: 1124263
Aaron, can you comment on the functionality lost from replacing the plugin hang UI with the e10s notification bar
Flags: needinfo?(aklotz)
I think that from a UX perspective this is great. My only concern would be that we need to be sure that, on Windows, a windowed plugin that is hung does not also hang the chrome window due to input queue synchronization.
Flags: needinfo?(aklotz)
The one difference that I'll point out compared to the pop-up dialog as that with the dialog we had a "don't show this again" checkbox. I'm not sure how relevant that still is in the case of a notification bar; I'll leave that to UX to decide! :-)
The functional work on this is complete, and we're out to aurora so it's time to get this done.  	

:Gijs can you help in getting this inserted into the firefox front end work queue?
Flags: needinfo?(gijskruitbosch+bugs)
OS: Linux → All
Hardware: x86_64 → All
Version: 34 Branch → Trunk
(In reply to Jim Mathies [:jimm] from comment #11)
> The functional work on this is complete, and we're out to aurora so it's
> time to get this done.

I'll just note that "it's on aurora, now let's do the frontend bit" doesn't work so well if the frontend needs new l10n strings...

> :Gijs can you help in getting this inserted into the firefox front end work
> queue?

I don't have time to do this myself "soon", so I'm going to redirect this to dolske in the hope that he has a better overview. From quickly skimming the bug, it actually sounds like we want to have a design first, maybe?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(dolske)
(In reply to :Gijs Kruitbosch from comment #12)
> (In reply to Jim Mathies [:jimm] from comment #11)
> > The functional work on this is complete, and we're out to aurora so it's
> > time to get this done.
> 
> I'll just note that "it's on aurora, now let's do the frontend bit" doesn't
> work so well if the frontend needs new l10n strings...

We are holding e10s at each train for two releases. Current roll out schedule has e10s reaching release in 42 or 43. 42 is currently on nightly, hence why I'm trying to bring attention to this.

https://wiki.mozilla.org/Electrolysis#Schedule
Attachment #8551797 - Attachment is obsolete: true
Attached image chrome popup
Attached image slow script dropdown (obsolete) —
Attached image plugin notification
Is this something for developers only? I'm not sure why we'd want to expose "Debug script" to mortal users.

Ditto for the jargony differences between kill plugin / kill webprocess / stop script. I think Chrome got the UI right here, simply offering "kill page" (although I'd probably want to call it "close tab"). I'm a little more sympathetic to having "kill plugin" (ideally as the only choice, shown when we know it's a plugin causing the problem)... But I'd suspect that often just results in broken pages and it should still just kill the tab.

I'm not sure what's being asked for here. Needs a UI spec before we can implement something. So back to phlsa per comment 1/2?
Flags: needinfo?(dolske) → needinfo?(philipp)
I think a big piece here is what sort of details we want to expose. We know the filename and line number of the script. I think we might be able to figure out what tab the script is from. We also know the name of the plugin if it's a plugin hang. Is that useful info? The tab probably is. Would we name it via the title?

(In reply to Justin Dolske [:Dolske] from comment #18)
> Is this something for developers only? I'm not sure why we'd want to expose
> "Debug script" to mortal users.

That's true, but I don't know how we would tell if someone is a developer. The non-e10s slow script dialog (admittedly not a UX triumph itself) also includes a debug option I think.

> Ditto for the jargony differences between kill plugin / kill webprocess /
> stop script. I think Chrome got the UI right here, simply offering "kill
> page" (although I'd probably want to call it "close tab"). I'm a little more
> sympathetic to having "kill plugin" (ideally as the only choice, shown when
> we know it's a plugin causing the problem)... But I'd suspect that often
> just results in broken pages and it should still just kill the tab.

Kill plugin is usually pretty successful. The plugin is just replaced with a gray rectangle (and some sort of "your plugin died" text) and everything else works fine. In my experience the malefactor is usually a Flash ad, so the page is actually improved.

One thing I think we need to make clear is that the "Kill process" option will affect multiple tabs (possibly all of them). So "Kill page" is not ideal in that sense.
(In reply to Bill McCloskey (:billm) from comment #19)
> That's true, but I don't know how we would tell if someone is a developer.
> The non-e10s slow script dialog (admittedly not a UX triumph itself) also
> includes a debug option I think.

Given that we now have an edition targeted specifically toward developers, what if we had the debug view on by default exclusively on Aurora?
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #20)
> Given that we now have an edition targeted specifically toward developers,
> what if we had the debug view on by default exclusively on Aurora?

I suspect most developers still use release.
I think we should start really simple here with a simple generic message and a "Recover" button that does whatever we think is best - kills a plugin if that is the problem, or kills the content process if we detect a content hang. I don't think we should expose more fine grained options. In the console we can dump all the debug info we need to, and if dev edition engineers want to expand on this to provide debugging features around it they can get that filed and integrated into their regular development queue. Slow scripts, hung plugins, hung content scripts - lets just take the most direct and reliable method of recovering from these three and do it.

FYI 42 merged to aurora in about three weeks (2015-08-10), we need to get this polished up by then.
Let's see what Shu has to say. He implemented the initial debug feature pretty recently. I'd hate to see it eliminated for UX reasons.
Flags: needinfo?(shu)
(In reply to Bill McCloskey (:billm) from comment #23)
> Let's see what Shu has to say. He implemented the initial debug feature
> pretty recently. I'd hate to see it eliminated for UX reasons.

I actually agree that showing "Debug Script" isn't useful most of the time and should be hidden by default. I propose we hide it by default and add a pref in the devtools config that would cause it to be shown.
Flags: needinfo?(shu)
(In reply to Jim Mathies [:jimm] from comment #22)
> In the
> console we can dump all the debug info we need to, and if dev edition
> engineers want to expand on this to provide debugging features around it
> they can get that filed and integrated into their regular development queue.

I don't see why adding a pref to display a "Debug" button in addition to "Recover" needs to be done separately. No reason to have a feature regression in the interim.
Could we display the Debug button only if the Devtools are opened and/or we are in Developer Edition/Nightly?
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #26)
> Could we display the Debug button only if the Devtools are opened and/or we
> are in Developer Edition/Nightly?

The DevEd/Nightly part seems fine to me, but why is the "only when devtools are open" part preferable to a pref?
Right, "... or when a pref is set".
Blocks: 1164508
(In reply to Shu-yu Guo [:shu] from comment #25)
> (In reply to Jim Mathies [:jimm] from comment #22)
> > In the
> > console we can dump all the debug info we need to, and if dev edition
> > engineers want to expand on this to provide debugging features around it
> > they can get that filed and integrated into their regular development queue.
> 
> I don't see why adding a pref to display a "Debug" button in addition to
> "Recover" needs to be done separately. No reason to have a feature
> regression in the interim.

As long as it doesn't require herculean effort to engineer then I think that's fine. Tie it to a devtools pref wfm too.
(In reply to Jim Mathies [:jimm] from comment #22)
> I think we should start really simple here with a simple generic message and
> a "Recover" button that does whatever we think is best - kills a plugin if
> that is the problem, or kills the content process if we detect a content
> hang. I don't think we should expose more fine grained options. In the
> console we can dump all the debug info we need to, and if dev edition
> engineers want to expand on this to provide debugging features around it
> they can get that filed and integrated into their regular development queue.
> Slow scripts, hung plugins, hung content scripts - lets just take the most
> direct and reliable method of recovering from these three and do it.
> 
> FYI 42 merged to aurora in about three weeks (2015-08-10), we need to get
> this polished up by then.

I agree 100%. I also like Yorics idea of tying it to whether or not devtools are open (or a pref in dev tools).

So am I right in summing up that we'd have a single notification bar for the content process and plugins with a single »Recover« button (unless devtools...)?

One further note here: If we kill the content process, we should probably do some auto-recovery. The user has just clicked a »Recover« button and is then presented with a »tabs crashed« message, which is less than great. Can we just automatically reload the tab when the process is ended through that notification?
Flags: needinfo?(philipp)
Flags: needinfo?(cweiner)
(In reply to Philipp Sackl [:phlsa] (Firefox UX) please use needinfo from comment #30)
> (In reply to Jim Mathies [:jimm] from comment #22)
> > I think we should start really simple here with a simple generic message and
> > a "Recover" button that does whatever we think is best - kills a plugin if
> > that is the problem, or kills the content process if we detect a content
> > hang. I don't think we should expose more fine grained options. In the
> > console we can dump all the debug info we need to, and if dev edition
> > engineers want to expand on this to provide debugging features around it
> > they can get that filed and integrated into their regular development queue.
> > Slow scripts, hung plugins, hung content scripts - lets just take the most
> > direct and reliable method of recovering from these three and do it.
> > 
> > FYI 42 merged to aurora in about three weeks (2015-08-10), we need to get
> > this polished up by then.
> 
> I agree 100%. I also like Yorics idea of tying it to whether or not devtools
> are open (or a pref in dev tools).
> 
> So am I right in summing up that we'd have a single notification bar for the
> content process and plugins with a single »Recover« button (unless
> devtools...)?

Yes.

> One further note here: If we kill the content process, we should probably do
> some auto-recovery. The user has just clicked a »Recover« button and is then
> presented with a »tabs crashed« message, which is less than great. Can we
> just automatically reload the tab when the process is ended through that
> notification?

Interesting question, we have to be careful about reloading content that has just had some sort of issue. I think we could retry once and then do something different with the tab that had the issue, which sounds like a UX question.

Removing 2 week old ni to cweiner@mozilla.com to get it back into triage and keep this bug moving.
Flags: needinfo?(cweiner)
> > One further note here: If we kill the content process, we should probably do
> > some auto-recovery. The user has just clicked a »Recover« button and is then
> > presented with a »tabs crashed« message, which is less than great. Can we
> > just automatically reload the tab when the process is ended through that
> > notification?
> 
> Interesting question, we have to be careful about reloading content that has
> just had some sort of issue. I think we could retry once and then do
> something different with the tab that had the issue, which sounds like a UX
> question.
> 
> Removing 2 week old ni to cweiner@mozilla.com to get it back into triage and
> keep this bug moving.

We want to display the tab crashed UI so users have the chance to enter information associated with the crash report. So I guess we don't want to do this, unless UX has some alternative suggestions.
Flags: needinfo?(philipp)
(In reply to Jim Mathies [:jimm] from comment #32)
> > > One further note here: If we kill the content process, we should probably do
> > > some auto-recovery. The user has just clicked a »Recover« button and is then
> > > presented with a »tabs crashed« message, which is less than great. Can we
> > > just automatically reload the tab when the process is ended through that
> > > notification?
> > 
> > Interesting question, we have to be careful about reloading content that has
> > just had some sort of issue. I think we could retry once and then do
> > something different with the tab that had the issue, which sounds like a UX
> > question.
> > 
> > Removing 2 week old ni to cweiner@mozilla.com to get it back into triage and
> > keep this bug moving.
> 
> We want to display the tab crashed UI so users have the chance to enter
> information associated with the crash report. So I guess we don't want to do
> this, unless UX has some alternative suggestions.

Then we should not call the button »recover«. If we do, it will seem like the recovery has failed and caused the crash.
Unfortunately, I can't think of good labels at the moment...
Flags: needinfo?(philipp)
Attachment #8632052 - Attachment is obsolete: true
Keywords: uiwanted
How about using just this as the notificaiton bar:
This page is unresponsive. [Kill page] [Wait] [x]

I'm pretty sure the lingo is not optimal here (we shouldn't talk about killing things), but I can't figure out anything better.

Matej, any thoughts?
For context: The »Kill page« button would crash the tab and display the tab crashed UI. We talked about labeling it »Recover« but it seems weird to make users click »Recover« and then present them with crash message.
Flags: needinfo?(matej)
(In reply to Philipp Sackl [:phlsa] (Firefox UX) please use needinfo from comment #34)
> How about using just this as the notificaiton bar:
> This page is unresponsive. [Kill page] [Wait] [x]
> 
> I'm pretty sure the lingo is not optimal here (we shouldn't talk about
> killing things), but I can't figure out anything better.
> 
> Matej, any thoughts?
> For context: The »Kill page« button would crash the tab and display the tab
> crashed UI. We talked about labeling it »Recover« but it seems weird to make
> users click »Recover« and then present them with crash message.

Could it say something like "Close tab" or "Stop loading" instead?
Flags: needinfo?(matej)
(In reply to Matej Novak [:matej] from comment #35)
> (In reply to Philipp Sackl [:phlsa] (Firefox UX) please use needinfo from
> comment #34)
> > How about using just this as the notificaiton bar:
> > This page is unresponsive. [Kill page] [Wait] [x]
> > 
> > I'm pretty sure the lingo is not optimal here (we shouldn't talk about
> > killing things), but I can't figure out anything better.
> > 
> > Matej, any thoughts?
> > For context: The »Kill page« button would crash the tab and display the tab
> > crashed UI. We talked about labeling it »Recover« but it seems weird to make
> > users click »Recover« and then present them with crash message.
> 
> Could it say something like "Close tab" or "Stop loading" instead?

It's not necessarily loading, and I don't think we're closing the tab (it'll still have back/forward history and so on).

Could we just say "Stop it", or if we feel that's too vague (kinda deliberate!) then maybe "Stop [this] page" ?
(In reply to :Gijs Kruitbosch from comment #36)
> (In reply to Matej Novak [:matej] from comment #35)
> > (In reply to Philipp Sackl [:phlsa] (Firefox UX) please use needinfo from
> > comment #34)
> > > How about using just this as the notificaiton bar:
> > > This page is unresponsive. [Kill page] [Wait] [x]
> > > 
> > > I'm pretty sure the lingo is not optimal here (we shouldn't talk about
> > > killing things), but I can't figure out anything better.
> > > 
> > > Matej, any thoughts?
> > > For context: The »Kill page« button would crash the tab and display the tab
> > > crashed UI. We talked about labeling it »Recover« but it seems weird to make
> > > users click »Recover« and then present them with crash message.
> > 
> > Could it say something like "Close tab" or "Stop loading" instead?
> 
> It's not necessarily loading, and I don't think we're closing the tab (it'll
> still have back/forward history and so on).
> 
> Could we just say "Stop it", or if we feel that's too vague (kinda
> deliberate!) then maybe "Stop [this] page" ?

A page isn't an action, though. What is it that we're stopping? I think we should give some indication as to what is being stopped, interrupted, etc. Is there another word we could use instead of "loading"?
(In reply to Matej Novak [:matej] from comment #37)
> (In reply to :Gijs Kruitbosch from comment #36)
> > Could we just say "Stop it", or if we feel that's too vague (kinda
> > deliberate!) then maybe "Stop [this] page" ?
> 
> A page isn't an action, though. What is it that we're stopping? I think we
> should give some indication as to what is being stopped, interrupted, etc.
> Is there another word we could use instead of "loading"?

I mean, we're stopping a script or plugin. But "Stop script or plugin" is clunky (and why doesn't the browser know which it is?) and still not a verb.

We don't know what the script/plugin is doing - it could be almost anything (it could be loading data, it could be processing user input, it could be processing stuff it got from the internet or other people, it could just be stuck in a loop to try to DoS the user, ...). So it's hard/impossible to pick an appropriate verb for the "action" that the page is trying to get done that is making things slow. We don't know the intent.

I think if the label before it said "This page is making things slow" and then the button said "Stop it", that would actually be clear enough, by reference - stop making things slow!

Does that make sense? I'm just amateur'ing here, so... ;-)
Flags: needinfo?(matej)
(In reply to :Gijs Kruitbosch from comment #38)
> (In reply to Matej Novak [:matej] from comment #37)
> > (In reply to :Gijs Kruitbosch from comment #36)
> > > Could we just say "Stop it", or if we feel that's too vague (kinda
> > > deliberate!) then maybe "Stop [this] page" ?
> > 
> > A page isn't an action, though. What is it that we're stopping? I think we
> > should give some indication as to what is being stopped, interrupted, etc.
> > Is there another word we could use instead of "loading"?
> 
> I mean, we're stopping a script or plugin. But "Stop script or plugin" is
> clunky (and why doesn't the browser know which it is?) and still not a verb.
> 
> We don't know what the script/plugin is doing - it could be almost anything
> (it could be loading data, it could be processing user input, it could be
> processing stuff it got from the internet or other people, it could just be
> stuck in a loop to try to DoS the user, ...). So it's hard/impossible to
> pick an appropriate verb for the "action" that the page is trying to get
> done that is making things slow. We don't know the intent.
> 
> I think if the label before it said "This page is making things slow" and
> then the button said "Stop it", that would actually be clear enough, by
> reference - stop making things slow!
> 
> Does that make sense? I'm just amateur'ing here, so... ;-)

I think changing the label to something like that would definitely make "Stop it" more clear. Good suggestion.
Flags: needinfo?(matej)
Assignee: nobody → jmathies
So just to confirm, the string in the notification should be:

"This page is making things slow"

and the buttons available would be:

"Stop it" "Wait" and "X" (close) 

With close basically being the same as "Wait".

Are we sure about the word 'things' in the text? Maybe this should be "your browser", "the browser", or maybe the brand name.
Flags: needinfo?(matej)
(In reply to Jim Mathies [:jimm] from comment #40)
> So just to confirm, the string in the notification should be:
> 
> "This page is making things slow"
> 
> and the buttons available would be:
> 
> "Stop it" "Wait" and "X" (close) 
> 
> With close basically being the same as "Wait".
> 
> Are we sure about the word 'things' in the text? Maybe this should be "your
> browser", "the browser", or maybe the brand name.

I would say either "This page is slowing down your browser" or "This page is making Firefox slow" (though I think the former is probably more factually accurate).
Flags: needinfo?(matej)
Thanks!
No longer blocks: 1164508
Duplicate of this bug: 1164508
Attached patch wip (obsolete) — Splinter Review
No longer blocks: e10s-miscblockers
Attached patch wip (obsolete) — Splinter Review
Attachment #8676316 - Attachment is obsolete: true
Attached patch wip (obsolete) — Splinter Review
Attachment #8676320 - Attachment is obsolete: true
Blocks: 1129174
Blocks: 1216782
Attached patch wip (obsolete) — Splinter Review
this is pretty much complete.
Attachment #8676330 - Attachment is obsolete: true
Attached patch hang monitor changes v.1 (obsolete) — Splinter Review
Attachment #8676803 - Attachment is obsolete: true
Attached patch hang monitor changes v.1 (obsolete) — Splinter Review
Attachment #8676934 - Attachment is obsolete: true
Attachment #8676935 - Flags: review?(mconley)
Attached patch tests (obsolete) — Splinter Review
Attachment #8676938 - Flags: review?(mconley)
Comment on attachment 8676935 [details] [diff] [review]
hang monitor changes v.1

Review of attachment 8676935 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good. Just a few questions and suggestions, but I'm pretty happy with it. I'll be happy to give an r+ once my questions get cleared up.

::: browser/base/content/tabbrowser.xml
@@ +38,5 @@
>  
>        <property name="tabContextMenu" readonly="true"
>                  onget="return this.tabContainer.contextMenu;"/>
>  
> +      <property name="scriptTerminated"

This scriptTerminated property is functionally equivalent to a field. Can we not just use a field then?

::: browser/modules/ProcessHangMonitor.jsm
@@ +27,5 @@
> +   * selects "Wait" in an existing notification.
> +   */
> +  get HANG_EXPIRATION_TIME() {
> +    try {
> +      return Services.prefs.getIntPref("browser.hangNotification.expiration");

Should we add these prefs? Also, is it necessary to read them every time, or can we memoize them, like:

get HANG_EXPIRATION_TIME() {
  let time;
  try {
    time = Services.prefs.getIntPref("browser.hangNotification.expiration");
  } catch (e) {
    time = 10000;
  }

  delete this.HANG_EXPIRATION_TIME;
  return this.HANG_EXPIRATION_TIME = time;
}

@@ +125,2 @@
>      if (!report) {
>        return;

Wait, hold on - so suppose we're showing the process-hang notification for a background tab that's misbehaving... when the user clicks "Stop It", we'll do nothing if there's no hang report for the selected tab? That seems bad.

Like, if the user is at about:newtab (which is still non-remote), then clicking Stop It for a blocked up content process is going to be ineffective, right?

I know that's not really what you're fixing here, but I think it's worth filing.

@@ +138,5 @@
> +
> +        // Set a timer to remove the script terminated attribute after
> +        // a short wait. We don't want to confuse hangs caused by
> +        // different scripts or pages.
> +        let timer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);

According to https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsITimer, we need to hold a reference to the timer in order to ensure that the timer is fired.

@@ +168,5 @@
> +
> +    // NOTE, we didn't call userCanceled on nsIHangReport here. This insures
> +    // we don't repeatedly generate and cache crash report data for this hang
> +    // in the process hang reporter. It already has one report for the browser
> +    // proccess we want it hold on to.

typo: "process", "we want it to hold onto"

@@ +191,5 @@
> +          this._activeReports.set(report, timer);
> +          break;
> +        }
> +      }
> +    }.bind(this), this.HANG_EXPIRATION_TIME, timer.TYPE_ONE_SHOT);

Can you use an arrow function instead of the {}.bind(this) pattern?

Also, you need to hold a reference to this timer, as above.

@@ +341,3 @@
>  
>      let buttons = [{
> +        label: bundle.getString("processHang.button1.label"),

It's not very easy to go from the string keyname to the callback, as the keyname "processHang.button1.label" doesn't really give good clues about what the button should do.

Can you rename these keys so that they reflect what the button's function is? That way, reviewers don't need to look at the translation strings to be sure.

::: dom/ipc/ProcessHangMonitor.cpp
@@ +858,5 @@
> +{
> +  MOZ_RELEASE_ASSERT(NS_IsMainThread());
> +
> +  // generates a crash report that includes all relevant processes.
> +  TerminatePlugin();

I don't think I understand this. We're terminating the plugin and all content processes in this method, and we want crash reports... will TerminatePlugin generate a crash report for the content processes if there doesn't happen to be a plugin process running?
Attachment #8676935 - Flags: review?(mconley) → review-
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #51)
> > +      <property name="scriptTerminated"
> 
> This scriptTerminated property is functionally equivalent to a field. Can we
> not just use a field then?

I thought this is the preferred relationship between fields and properties. A field is internal to tabbrowser, a property is public?

> ::: browser/modules/ProcessHangMonitor.jsm
> @@ +27,5 @@
> > +   * selects "Wait" in an existing notification.
> > +   */
> > +  get HANG_EXPIRATION_TIME() {
> > +    try {
> > +      return Services.prefs.getIntPref("browser.hangNotification.expiration");
> 
> Should we add these prefs?

I added these mostly for testing purposes. I thought about adding them to firefox.js, but wasn't sure if we wanted to expose them through about:config. I'm still not sure.

> Also, is it necessary to read them every time, or
> can we memoize them, like:

> get HANG_EXPIRATION_TIME() {
>   let time;
>   try {
>     time = Services.prefs.getIntPref("browser.hangNotification.expiration");
>   } catch (e) {
>     time = 10000;
>   }
> 
>   delete this.HANG_EXPIRATION_TIME;
>   return this.HANG_EXPIRATION_TIME = time;
> }

Hmm, ok, what does this do? It looks like the module still tries to read the pref every time. 

> @@ +125,2 @@
> >      if (!report) {
> >        return;
> 
> Wait, hold on - so suppose we're showing the process-hang notification for a
> background tab that's misbehaving... when the user clicks "Stop It", we'll
> do nothing if there's no hang report for the selected tab? That seems bad.
> 
> Like, if the user is at about:newtab (which is still non-remote), then
> clicking Stop It for a blocked up content process is going to be
> ineffective, right?
> 
> I know that's not really what you're fixing here, but I think it's worth
> filing.

I'll ask billm, the actually lookup is based on the nsIFrameLoaderOwner. Not sure if that works around the issue.  

> 
> @@ +138,5 @@
> > +
> > +        // Set a timer to remove the script terminated attribute after
> > +        // a short wait. We don't want to confuse hangs caused by
> > +        // different scripts or pages.
> > +        let timer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> 
> According to
> https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/
> Interface/nsITimer, we need to hold a reference to the timer in order to
> ensure that the timer is fired.
> 
> @@ +168,5 @@
> > +
> > +    // NOTE, we didn't call userCanceled on nsIHangReport here. This insures
> > +    // we don't repeatedly generate and cache crash report data for this hang
> > +    // in the process hang reporter. It already has one report for the browser
> > +    // proccess we want it hold on to.
> 
> typo: "process", "we want it to hold onto"
> 
> @@ +191,5 @@
> > +          this._activeReports.set(report, timer);
> > +          break;
> > +        }
> > +      }
> > +    }.bind(this), this.HANG_EXPIRATION_TIME, timer.TYPE_ONE_SHOT);
> 
> Can you use an arrow function instead of the {}.bind(this) pattern?
> 
> Also, you need to hold a reference to this timer, as above.

hmm, that sucks. I'll take a look.

> @@ +341,3 @@
> >  
> >      let buttons = [{
> > +        label: bundle.getString("processHang.button1.label"),
> 
> It's not very easy to go from the string keyname to the callback, as the
> keyname "processHang.button1.label" doesn't really give good clues about
> what the button should do.
> 
> Can you rename these keys so that they reflect what the button's function
> is? That way, reviewers don't need to look at the translation strings to be
> sure.

sure.

> ::: dom/ipc/ProcessHangMonitor.cpp
> @@ +858,5 @@
> > +{
> > +  MOZ_RELEASE_ASSERT(NS_IsMainThread());
> > +
> > +  // generates a crash report that includes all relevant processes.
> > +  TerminatePlugin();
> 
> I don't think I understand this. We're terminating the plugin and all
> content processes in this method, and we want crash reports... will
> TerminatePlugin generate a crash report for the content processes if there
> doesn't happen to be a plugin process running?

TerminatePlugin only kills the plugin process and takes a set of reports for related processes. TerminateAllApplicable() pretty hard core, I might remove this based on something bill said in the meeting today.
> > ::: browser/modules/ProcessHangMonitor.jsm
> > @@ +27,5 @@
> > > +   * selects "Wait" in an existing notification.
> > > +   */
> > > +  get HANG_EXPIRATION_TIME() {
> > > +    try {
> > > +      return Services.prefs.getIntPref("browser.hangNotification.expiration");
> > 
> > Should we add these prefs?
> 
> I added these mostly for testing purposes. I thought about adding them to
> firefox.js, but wasn't sure if we wanted to expose them through
> about:config. I'm still not sure.
> 
> > Also, is it necessary to read them every time, or
> > can we memoize them, like:

Note, reading them every time is the behavior I want.
Comment on attachment 8676938 [details] [diff] [review]
tests

Review of attachment 8676938 [details] [diff] [review]:
-----------------------------------------------------------------

This is a great test - thanks jimm.

I have no questions here - just a number of suggestions which you can take or drop at your discretion.

::: browser/modules/test/browser_ProcessHangNotifications.js
@@ +1,1 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public

New tests are supposed to be Creative Commons - so you can just get rid of this, I think.

@@ +1,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +Components.utils.import("resource://gre/modules/BrowserUtils.jsm");

You should be able to use Cu in this scope.

@@ +4,5 @@
> +
> +Components.utils.import("resource://gre/modules/BrowserUtils.jsm");
> +Cu.import("resource://gre/modules/UpdateUtils.jsm");
> +
> +function getNotificationBox() {

Instead of getting the most recent browser window, this should probably take the window as an argument. Then, have the callers pass in "window".

@@ +9,5 @@
> +  let win = Services.wm.getMostRecentWindow("navigator:browser");
> +  return win.document.getElementById("high-priority-global-notificationbox");
> +}
> +
> +function promisePopupNotificationShown(aName) {

PopupNotification is probably not the right term. We have too many notification types unfortunately, and "PopupNotification" is the term we use for things like the GeoLocation permissions popup, or other permissions popups.

Also, Promise.defer() is, I believe, going away soon. We need to use the following pattern:

function promiseNotificationShown(aName) {
  return new Promise((resolve) => {
    let box = getNotificationBox();
    box.addEventListener("foo"... function active() {
      box.removeEventListener(...
      is(...
      resolve();
    });
  });
}

@@ +15,5 @@
> +  let notification = getNotificationBox();
> +  notification.addEventListener("AlertActive", function active() {
> +    notification.removeEventListener("AlertActive", active, true);
> +    is(notification.allNotifications.length, 1, "Notification Displayed.");
> +    deferred.resolve();

Maybe this Promise should resolve passing back the notification? See below for rationale.

@@ +53,5 @@
> +  TEST_CALLBACK_TERMPROC: 3,
> +  TEST_CALLBACK_TERMALL: 4,
> +
> +  _hangType: 1,
> +  _tcb: function (aCallbackType) { BrowserUtils.dumpLn("default callback", aCallbackType); },

This is going to just print, which is fine I guess - but it looks like these tests use promiseReportCallMade to ensure the callback is called with particular "types".

Do we also want to ensure that gTestHangReport's callback is never called unexpectedly? If so, perhaps instead of printing this stuff out (or after printing it out), use ok(false) to create a test error.

@@ +63,5 @@
> +  set hangType(aValue) {
> +    this._hangType = aValue;
> +  },
> +
> +  set testCallback(aValue) {

This is never used.

@@ +95,5 @@
> +    return true;
> +  }
> +};
> +
> +let buttonCount = (UpdateUtils.UpdateChannel == "aurora" ? 3 : 2);

I think this is worth explaining with a quick comment.

@@ +98,5 @@
> +
> +let buttonCount = (UpdateUtils.UpdateChannel == "aurora" ? 3 : 2);
> +
> +/**
> + * Test if hang reports receive a terminate script callback when the user selects

Thanks for the great test documentation!

@@ +102,5 @@
> + * Test if hang reports receive a terminate script callback when the user selects
> + * stop in response to a script hang.
> + */
> +
> +add_task(function* () {

It's handy to give these generator functions names, since it's easier to see which test is running (and failing) in the output log.

Example:

add_task(function* receive_terminate_script_cb() {
  //...
});

@@ +105,5 @@
> +
> +add_task(function* () {
> +  let promise = promisePopupNotificationShown("process-hang");
> +  Services.obs.notifyObservers(gTestHangReport, "process-hang-report", null);
> +  yield promise;

It promiseNotificationShown resolves with the notification, line 111 should not be necessary any more - you can do:

let promise = promiseNotificationShown("process-hang");
Services.obs.notifyObservers(gTestHangReport, "process-hang-report", null);
let notification = yield promise;

let buttons = notification.getElementsByTagName("button");
...
etc

@@ +109,5 @@
> +  yield promise;
> +
> +  let notification = getNotificationBox();
> +  let buttons = notification.currentNotification.getElementsByTagName("button");
> +  is(buttons.length, buttonCount, "proper number of buttons");

This check is happening in each test. If promiseNotificationShown is going to resolve with the notification, can we also do the buttons length check before resolving too?

@@ +123,5 @@
> + * Test if hang reports receive terminate process callbacks after script
> + * termination fails to fix a hung browser. Relies on the previous task.
> + */
> +
> +add_task(function* () {

Please add a function name.

@@ +143,5 @@
> + * Test if hang reports receive user canceled callbacks after a user selects wait
> + * and the browser frees up from a script hang on its own.
> + */
> +
> +add_task(function* () {

Please add a function name.

@@ +174,5 @@
> + * Tests if hang reports receive a terminate applicable callback when the user selects
> + * stop in response to a plugin hang.
> + */
> +
> +add_task(function* () {

Please add a function name.
Attachment #8676938 - Flags: review?(mconley) → review+
Comment on attachment 8676935 [details] [diff] [review]
hang monitor changes v.1

Review of attachment 8676935 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/tabbrowser.xml
@@ +38,5 @@
>  
>        <property name="tabContextMenu" readonly="true"
>                  onget="return this.tabContainer.contextMenu;"/>
>  
> +      <property name="scriptTerminated"

Another thing to keep in mind, that I just remembered, is that if the XBL binding is ever removed and reapplied (like if the browser is removed from the DOM and re-inserted during a remoteness switch), the value stashed in here will be lost.

Do we account for that?
Attached patch hang monitor changes v.2 (obsolete) — Splinter Review
Attachment #8676935 - Attachment is obsolete: true
Attachment #8676938 - Attachment is obsolete: true
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #51)
> Wait, hold on - so suppose we're showing the process-hang notification for a
> background tab that's misbehaving... when the user clicks "Stop It", we'll
> do nothing if there's no hang report for the selected tab? That seems bad.
> 
> Like, if the user is at about:newtab (which is still non-remote), then
> clicking Stop It for a blocked up content process is going to be
> ineffective, right?

Turns out the notification bar hides when you flip to about:newtab, so you can't interact with it unless you're really quick with the mouse. The notification displays for every tab is the window, so it's not a per tab situation.
Attachment #8677637 - Attachment is obsolete: true
Attachment #8677643 - Flags: review?(mconley)
Attached patch tests v.2Splinter Review
Comment on attachment 8677643 [details] [diff] [review]
hang monitor changes v.2

Review of attachment 8677643 [details] [diff] [review]:
-----------------------------------------------------------------

I think this looks good. Much simpler than the original implementation too. Thanks jimm!

::: browser/modules/ProcessHangMonitor.jsm
@@ +152,5 @@
> +
> +          // Store the timer in the active reports map. If we receive a new
> +          // observer notification for this hang, we'll redisplay the browser
> +          // notification in reportHang below. If we do not receive a new
> +          // observer, timer will take care fo cleaning up resources associated

Nit barely worth mentioning, but "fo" -> "of"
Attachment #8677643 - Flags: review?(mconley) → review+
https://hg.mozilla.org/mozilla-central/rev/88b02bfcc8c7
https://hg.mozilla.org/mozilla-central/rev/c7641c846430
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 44
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=2514354&repo=mozilla-central
Status: RESOLVED → REOPENED
Flags: needinfo?(jmathies)
Resolution: FIXED → ---
Looks like prior tests leave notification around breaking this. I added some code that closes these before the test runs, lets see if it helps.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bbfe71886024
Flags: needinfo?(jmathies)
Comment on attachment 8678996 [details] [diff] [review]
notification touchup

This appears to have fixed the problem on linux. Waiting on runs from osx and win.
Attachment #8678996 - Flags: review?(mconley)
Comment on attachment 8678996 [details] [diff] [review]
notification touchup

Review of attachment 8678996 [details] [diff] [review]:
-----------------------------------------------------------------

Hm. I think it'd probably be better if the tests that caused those notifications removed them properly instead. Do we know where they came from?
Comment on attachment 8678996 [details] [diff] [review]
notification touchup

sure, I'll push something to try.
Flags: needinfo?(jmathies)
Attachment #8678996 - Flags: review?(mconley)
Hey Carsten, one thing I've noticed in trying to reproduce this is that I get different test suite blocks on try compared to what get run on mozilla-central. For example in your merge the test failed in linux pgo bc4, but when I pushed to try the test showed up in linux opt bc5. The test blocks look the same but I'm guessing there are differences otherwise the bc values would match up. Any idea why I get two different outcomes between try and mc?

mc merge:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=5ca03a00d268
try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=44ba17a0a2c8

I'm tempted to reland on mi and see if the failures show up again.
Flags: needinfo?(cbook)
> I'm tempted to reland on mi and see if the failures show up again.

Actually if I can't reproduce on try reliable I'd prefer to land 'notification touchup' with this to be safe and be done with it.
Relanded on inbound, I haven't had any luck reproducing on try. If these tests regress please just backout the tests patch (4aa77fc9d786). I'll file a follow up on landing those.
https://hg.mozilla.org/mozilla-central/rev/3e035332fbd3
https://hg.mozilla.org/mozilla-central/rev/4aa77fc9d786
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: Firefox 44 → Firefox 45
Flags: needinfo?(cbook)
removing the b2g 2.5 flag since this commit has been reverted due to an incorrect merge, sorry for the confusion
Depends on: 1232204
See Also: → 1365648
You need to log in before you can comment on or make changes to this bug.