The e10s slow script notification bar keeps spontaneously disappearing and reappearing in cycles.

RESOLVED FIXED in Firefox 48

Status

()

P1
normal
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: jujjyl, Assigned: billm)

Tracking

Trunk
Firefox 48
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+, firefox41 affected, firefox46 wontfix, firefox47 wontfix, firefox48 fixed)

Details

Attachments

(1 attachment, 3 obsolete attachments)

(Reporter)

Description

4 years ago
STR:

1. Navigate to infiniteloop2.html from bug https://bugzilla.mozilla.org/show_bug.cgi?id=1164286 .
2. Wait for several minutes and observe the UI visually.

Observed:

After some seconds, the slow script notification bar appears. When no response is given in some amount of time, the notification bar spontaneously disappears. Then, again after a few seconds, the notification bar appears again, and so on; i.e. the bar keeps appearing and disappearing in cycles.

Excepted:

After presented the first time, the slow script notification bar should be shown as long until a) the user has taken his time to respond to it.

A question is what should happen if the script "resolves" (completes execution and yields back to the event loop) itself before the user has had the time to  respond. E.g. if there is a requestAnimationFrame()-based animation that takes, say, 20 seconds to run through each cycle, then effectively the page is hung, but simply hiding the slow script notification bar after each rAF finishes might not be good user experience. Perhaps the slow script notification bar should autohide itself only if the browser has been idle from running user JS scripts for e.g. >1 seconds?
(Reporter)

Updated

4 years ago
tracking-e10s: --- → ?
Flags: needinfo?(twalker)
Jukka,

I am unable to reproduce this with latest Nightly build on Mac. What platform are you on?  Also does this occur for you with all of your add-ons disabled (that or try in a fresh profile).
Flags: needinfo?(twalker) → needinfo?(jujjyl)
(Reporter)

Comment 2

4 years ago
I am on Windows, although I can see the issue appears on OS X on well, although more erratic and not as consistent. Here is a video (I sped it up 6x to keep the file size down) that shows the issue:

https://dl.dropboxusercontent.com/u/40949268/Bugs/slow_script_dialog.m2v
Flags: needinfo?(jujjyl)
tracking-e10s: ? → m8+
tracking-e10s: m8+ → +
User Agent  Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:46.0) Gecko/20100101 Firefox/46.0
User Agent  Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0
User Agent  Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0

Build ID  20160208004010

Tested on WIndows 7, Mac OS 10.10 and Ubuntu 15.04 on the 46.0a2 Aurora build. I can reproduce this issue on all the platforms mentioned.
Priority: -- → P1
Created attachment 8735689 [details] [diff] [review]
check_responsive.patch
Assignee: nobody → blassey.bugs
Attachment #8735689 - Flags: review?(mconley)
Created attachment 8735692 [details] [diff] [review]
check_responsive.patch
Attachment #8735689 - Attachment is obsolete: true
Attachment #8735689 - Flags: review?(mconley)
Attachment #8735692 - Flags: review?(mconley)
To give some background, what I'm seeing is that our content process is sending out of memory notifications every second or two, and then there will be a long pause of up to 40s between notifications. I suspect we're hitting a particularly terrible GC/CC pause. Regardless, deciding that we're un-hung because of a lack of notifications alone seems wrong.
Created attachment 8736157 [details] [diff] [review]
check_responsive.patch

fixes the test
Attachment #8735692 - Attachment is obsolete: true
Attachment #8735692 - Flags: review?(mconley)
Attachment #8736157 - Flags: review?(mconley)
Comment on attachment 8736157 [details] [diff] [review]
check_responsive.patch

Review of attachment 8736157 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for investigating this, blassey.

Nothing major came up in my review, but I do think we might want billm to give this a quick look just in case, as the back-end for the process hang monitor stuff isn't really something I'm too too familiar with.

::: browser/modules/ProcessHangMonitor.jsm
@@ +420,5 @@
>     */
>    notify: function(timer) {
>      for (let [otherReport, otherTimer] of this._activeReports) {
>        if (otherTimer === timer) {
> +        let self = this;

Thankfully, with ES6 arrow functions, we don't need the self = this stuff anymore.

In an arrow function, the this is inherited from the enclosing context, so you can use:

otherReport.checkResponsive().then(() => {
  if (!timer.callback) { // callback is null, so this hasn't been reset
    this.removeActiveReport(otherReport);
    otherReport.userCanceled();
  }
});

::: dom/ipc/ProcessHangMonitor.cpp
@@ +778,5 @@
>  
>  NS_IMETHODIMP
> +HangMonitoredProcess::CheckResponsive(JS::MutableHandleValue aPromise)
> +{
> +    MOZ_RELEASE_ASSERT(NS_IsMainThread());

Busted indentation

@@ +795,5 @@
> +  TabParent* tabParent;
> +  for (size_t i = 0; i < tabs.Length(); i++) {
> +    TabParent* tp = TabParent::GetFrom(tabs[i]);
> +    if (tp->GetTabId() == tabId) {
> +      tabParent = tp;

I guess we can break out of this loop at this point?

@@ +799,5 @@
> +      tabParent = tp;
> +    }
> +  }
> +
> +  if (!tabParent) {

We might want to do this as well if the TabParent has somehow been Destroy'd, so maybe check IsDestroyed here (since, I suppose, theoretically this could be called on a TabParent at any time).

@@ +812,5 @@
> +  if (!window) {
> +    return NS_ERROR_UNEXPECTED;
> +  }
> +  nsCOMPtr<nsIDocShell> docShell = nsPIDOMWindowOuter::From(window)->GetDocShell();
> +  nsCOMPtr<nsIGlobalObject> sgo = docShell->GetScriptGlobalObject();

If 807-816 is all for getting at the SGO, I wonder if it could be made slightly simpler. If you update your idl so that it has the [implicit_jscontext] decorator, you'll get a JSContext here.

From there, I think you can use:

nsIGlobalObject* go = xpc::NativeGlobal(JS::CurrentGlobalOrNull(aCx));

Or is there a reason we want the SGO from the window with the hung tab, instead of from the context of the caller?

@@ +822,5 @@
> +  if (result.Failed()) {
> +    return result.StealNSResult();
> +  }
> +  tabParent->CheckResponsive(promise);
> +  aPromise.setObject(*promise->PromiseObj());

Also, fwiw, if you return the Promise as an nsISupports, I believe XPConnect will somehow "do the right thing", and JS consumers of this API will just get back a Promise that they don't need to QI or anything.

Not sure that adds much, but thought I'd mention it.

::: dom/ipc/TabParent.cpp
@@ +1612,5 @@
>  
> +bool
> +TabParent::RecvAckResponsive(const uint64_t& aId)
> +{
> +  if (mCheckResponsivePromises.find(aId) == mCheckResponsivePromises.end())

Needs braces, even around one-liners.

@@ +1615,5 @@
> +{
> +  if (mCheckResponsivePromises.find(aId) == mCheckResponsivePromises.end())
> +    return true;
> +
> +  if (!!mCheckResponsivePromises[aId]) {

Out of curiosity, when would this ever not be the case?

::: dom/ipc/TabParent.h
@@ +408,5 @@
>  
> +  virtual bool
> +  RecvAckResponsive(const uint64_t& aId) override;
> +
> +  void CheckResponsive(Promise*);

Style-wise, I think it's more in tune with the rest of this file to include a dummy name for the pointer.

::: dom/ipc/nsIHangReport.idl
@@ +63,5 @@
>    // Inquire whether the report is for a content process loaded by the given
>    // frameloader.
>    bool isReportForBrowser(in nsIFrameLoader aFrameLoader);
> +
> +  // Check to see if the process has become responsive

I think this could be made a little clearer by saying "Returns a Promise that resolves once the process has become responsive".
Attachment #8736157 - Flags: review?(wmccloskey)
Attachment #8736157 - Flags: review?(mconley)
Attachment #8736157 - Flags: review-
Created attachment 8737433 [details] [diff] [review]
patch

We actually already have a mechanism to find out when a hang has ended. It just isn't hooked up to the UI yet. This patch does that.
Attachment #8736157 - Attachment is obsolete: true
Attachment #8736157 - Flags: review?(wmccloskey)
Attachment #8737433 - Flags: review?(mrbkap)
Comment on attachment 8737433 [details] [diff] [review]
patch

Review of attachment 8737433 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ProcessHangMonitor.cpp
@@ +168,5 @@
>    }
>  
> +  void ClearHang() {
> +    mHangData = HangData();
> +    mBrowserDumpId = nsString();

mBrowserDumpId.Truncate();
Attachment #8737433 - Flags: review?(mrbkap) → review+

Comment 12

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d20bb8617b25
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
Blake, should we uplift Brad's fix to 47 in preparation for our next e10s beta experiment? Could this bug contribute to our slow script count regression with e10s (bug 1249978) or is it just a visual bug?
status-firefox46: --- → wontfix
status-firefox47: --- → ?
Flags: needinfo?(mrbkap)
(In reply to Chris Peterson [:cpeterson] from comment #13)
> Blake, should we uplift Brad's fix to 47 in preparation for our next e10s
> beta experiment? Could this bug contribute to our slow script count
> regression with e10s (bug 1249978) or is it just a visual bug?

It looks like this could cause us to report multiple slow script dialogs for the same hang. In particular, if a content process is hung and the user doesn't respond to the dialog, we'll continue showing new slow script dialogs until either the process is killed or the user responds. We should probably uplift this patch to try to mitigate that.
Flags: needinfo?(mrbkap)
Blake, can you please shepherd this patch uplift to 47? Brad wrote this patch, but he is out on PTO for two weeks.
status-firefox47: ? → affected
Flags: needinfo?(mrbkap)
(In reply to Chris Peterson [:cpeterson] from comment #15)
> Blake, can you please shepherd this patch uplift to 47? Brad wrote this
> patch, but he is out on PTO for two weeks.

The patch that landed was billm's, I believe - and he's kinda working part time right now.
I don't think we need this for the experiment. I don't think it will affect the numbers for SLOW_SCRIPT_PAGE_COUNT (bug 1251667), and that's the measurement we're now gated on. It's a minor usability improvement for e10s users, but we're not actually releasing e10s in 47, so it doesn't seem worth it to me.

Blake, let me know if you disagree.
Yeah, this won't affect SLOW_SCRIPT_PAGE_COUNT (though it will affect SLOW_SCRIPT_NOTICE_COUNT), so we can probably leave it alone.
Flags: needinfo?(mrbkap)
Thanks. SGTM.
status-firefox47: affected → wontfix
Assignee: blassey.bugs → wmccloskey
Depends on: 1274367
Heads up, bug 1270729 landing should have an impact on this.

Updated

2 years ago
Depends on: 1316646
You need to log in before you can comment on or make changes to this bug.