Closed Bug 1164931 Opened 5 years ago Closed 4 years ago

The e10s slow script notification bar keeps spontaneously disappearing and reappearing in cycles.

Categories

(Firefox :: General, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 48
Tracking Status
e10s + ---
firefox41 --- affected
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- fixed

People

(Reporter: jujjyl, Assigned: billm)

References

Details

Attachments

(1 file, 3 obsolete files)

STR:

1. Navigate to infiniteloop2.html from bug https://bugzilla.mozilla.org/show_bug.cgi?id=1164286 .
2. Wait for several minutes and observe the UI visually.

Observed:

After some seconds, the slow script notification bar appears. When no response is given in some amount of time, the notification bar spontaneously disappears. Then, again after a few seconds, the notification bar appears again, and so on; i.e. the bar keeps appearing and disappearing in cycles.

Excepted:

After presented the first time, the slow script notification bar should be shown as long until a) the user has taken his time to respond to it.

A question is what should happen if the script "resolves" (completes execution and yields back to the event loop) itself before the user has had the time to  respond. E.g. if there is a requestAnimationFrame()-based animation that takes, say, 20 seconds to run through each cycle, then effectively the page is hung, but simply hiding the slow script notification bar after each rAF finishes might not be good user experience. Perhaps the slow script notification bar should autohide itself only if the browser has been idle from running user JS scripts for e.g. >1 seconds?
tracking-e10s: --- → ?
Jukka,

I am unable to reproduce this with latest Nightly build on Mac. What platform are you on?  Also does this occur for you with all of your add-ons disabled (that or try in a fresh profile).
Flags: needinfo?(twalker) → needinfo?(jujjyl)
I am on Windows, although I can see the issue appears on OS X on well, although more erratic and not as consistent. Here is a video (I sped it up 6x to keep the file size down) that shows the issue:

https://dl.dropboxusercontent.com/u/40949268/Bugs/slow_script_dialog.m2v
Flags: needinfo?(jujjyl)
User Agent  Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:46.0) Gecko/20100101 Firefox/46.0
User Agent  Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0
User Agent  Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0

Build ID  20160208004010

Tested on WIndows 7, Mac OS 10.10 and Ubuntu 15.04 on the 46.0a2 Aurora build. I can reproduce this issue on all the platforms mentioned.
Priority: -- → P1
Attached patch check_responsive.patch (obsolete) — Splinter Review
Assignee: nobody → blassey.bugs
Attachment #8735689 - Flags: review?(mconley)
Attached patch check_responsive.patch (obsolete) — Splinter Review
Attachment #8735689 - Attachment is obsolete: true
Attachment #8735689 - Flags: review?(mconley)
Attachment #8735692 - Flags: review?(mconley)
To give some background, what I'm seeing is that our content process is sending out of memory notifications every second or two, and then there will be a long pause of up to 40s between notifications. I suspect we're hitting a particularly terrible GC/CC pause. Regardless, deciding that we're un-hung because of a lack of notifications alone seems wrong.
Attached patch check_responsive.patch (obsolete) — Splinter Review
fixes the test
Attachment #8735692 - Attachment is obsolete: true
Attachment #8735692 - Flags: review?(mconley)
Attachment #8736157 - Flags: review?(mconley)
Comment on attachment 8736157 [details] [diff] [review]
check_responsive.patch

Review of attachment 8736157 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for investigating this, blassey.

Nothing major came up in my review, but I do think we might want billm to give this a quick look just in case, as the back-end for the process hang monitor stuff isn't really something I'm too too familiar with.

::: browser/modules/ProcessHangMonitor.jsm
@@ +420,5 @@
>     */
>    notify: function(timer) {
>      for (let [otherReport, otherTimer] of this._activeReports) {
>        if (otherTimer === timer) {
> +        let self = this;

Thankfully, with ES6 arrow functions, we don't need the self = this stuff anymore.

In an arrow function, the this is inherited from the enclosing context, so you can use:

otherReport.checkResponsive().then(() => {
  if (!timer.callback) { // callback is null, so this hasn't been reset
    this.removeActiveReport(otherReport);
    otherReport.userCanceled();
  }
});

::: dom/ipc/ProcessHangMonitor.cpp
@@ +778,5 @@
>  
>  NS_IMETHODIMP
> +HangMonitoredProcess::CheckResponsive(JS::MutableHandleValue aPromise)
> +{
> +    MOZ_RELEASE_ASSERT(NS_IsMainThread());

Busted indentation

@@ +795,5 @@
> +  TabParent* tabParent;
> +  for (size_t i = 0; i < tabs.Length(); i++) {
> +    TabParent* tp = TabParent::GetFrom(tabs[i]);
> +    if (tp->GetTabId() == tabId) {
> +      tabParent = tp;

I guess we can break out of this loop at this point?

@@ +799,5 @@
> +      tabParent = tp;
> +    }
> +  }
> +
> +  if (!tabParent) {

We might want to do this as well if the TabParent has somehow been Destroy'd, so maybe check IsDestroyed here (since, I suppose, theoretically this could be called on a TabParent at any time).

@@ +812,5 @@
> +  if (!window) {
> +    return NS_ERROR_UNEXPECTED;
> +  }
> +  nsCOMPtr<nsIDocShell> docShell = nsPIDOMWindowOuter::From(window)->GetDocShell();
> +  nsCOMPtr<nsIGlobalObject> sgo = docShell->GetScriptGlobalObject();

If 807-816 is all for getting at the SGO, I wonder if it could be made slightly simpler. If you update your idl so that it has the [implicit_jscontext] decorator, you'll get a JSContext here.

From there, I think you can use:

nsIGlobalObject* go = xpc::NativeGlobal(JS::CurrentGlobalOrNull(aCx));

Or is there a reason we want the SGO from the window with the hung tab, instead of from the context of the caller?

@@ +822,5 @@
> +  if (result.Failed()) {
> +    return result.StealNSResult();
> +  }
> +  tabParent->CheckResponsive(promise);
> +  aPromise.setObject(*promise->PromiseObj());

Also, fwiw, if you return the Promise as an nsISupports, I believe XPConnect will somehow "do the right thing", and JS consumers of this API will just get back a Promise that they don't need to QI or anything.

Not sure that adds much, but thought I'd mention it.

::: dom/ipc/TabParent.cpp
@@ +1612,5 @@
>  
> +bool
> +TabParent::RecvAckResponsive(const uint64_t& aId)
> +{
> +  if (mCheckResponsivePromises.find(aId) == mCheckResponsivePromises.end())

Needs braces, even around one-liners.

@@ +1615,5 @@
> +{
> +  if (mCheckResponsivePromises.find(aId) == mCheckResponsivePromises.end())
> +    return true;
> +
> +  if (!!mCheckResponsivePromises[aId]) {

Out of curiosity, when would this ever not be the case?

::: dom/ipc/TabParent.h
@@ +408,5 @@
>  
> +  virtual bool
> +  RecvAckResponsive(const uint64_t& aId) override;
> +
> +  void CheckResponsive(Promise*);

Style-wise, I think it's more in tune with the rest of this file to include a dummy name for the pointer.

::: dom/ipc/nsIHangReport.idl
@@ +63,5 @@
>    // Inquire whether the report is for a content process loaded by the given
>    // frameloader.
>    bool isReportForBrowser(in nsIFrameLoader aFrameLoader);
> +
> +  // Check to see if the process has become responsive

I think this could be made a little clearer by saying "Returns a Promise that resolves once the process has become responsive".
Attachment #8736157 - Flags: review?(wmccloskey)
Attachment #8736157 - Flags: review?(mconley)
Attachment #8736157 - Flags: review-
Attached patch patchSplinter Review
We actually already have a mechanism to find out when a hang has ended. It just isn't hooked up to the UI yet. This patch does that.
Attachment #8736157 - Attachment is obsolete: true
Attachment #8736157 - Flags: review?(wmccloskey)
Attachment #8737433 - Flags: review?(mrbkap)
Comment on attachment 8737433 [details] [diff] [review]
patch

Review of attachment 8737433 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ProcessHangMonitor.cpp
@@ +168,5 @@
>    }
>  
> +  void ClearHang() {
> +    mHangData = HangData();
> +    mBrowserDumpId = nsString();

mBrowserDumpId.Truncate();
Attachment #8737433 - Flags: review?(mrbkap) → review+
https://hg.mozilla.org/mozilla-central/rev/d20bb8617b25
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
Blake, should we uplift Brad's fix to 47 in preparation for our next e10s beta experiment? Could this bug contribute to our slow script count regression with e10s (bug 1249978) or is it just a visual bug?
Flags: needinfo?(mrbkap)
(In reply to Chris Peterson [:cpeterson] from comment #13)
> Blake, should we uplift Brad's fix to 47 in preparation for our next e10s
> beta experiment? Could this bug contribute to our slow script count
> regression with e10s (bug 1249978) or is it just a visual bug?

It looks like this could cause us to report multiple slow script dialogs for the same hang. In particular, if a content process is hung and the user doesn't respond to the dialog, we'll continue showing new slow script dialogs until either the process is killed or the user responds. We should probably uplift this patch to try to mitigate that.
Flags: needinfo?(mrbkap)
Blake, can you please shepherd this patch uplift to 47? Brad wrote this patch, but he is out on PTO for two weeks.
Flags: needinfo?(mrbkap)
(In reply to Chris Peterson [:cpeterson] from comment #15)
> Blake, can you please shepherd this patch uplift to 47? Brad wrote this
> patch, but he is out on PTO for two weeks.

The patch that landed was billm's, I believe - and he's kinda working part time right now.
I don't think we need this for the experiment. I don't think it will affect the numbers for SLOW_SCRIPT_PAGE_COUNT (bug 1251667), and that's the measurement we're now gated on. It's a minor usability improvement for e10s users, but we're not actually releasing e10s in 47, so it doesn't seem worth it to me.

Blake, let me know if you disagree.
Yeah, this won't affect SLOW_SCRIPT_PAGE_COUNT (though it will affect SLOW_SCRIPT_NOTICE_COUNT), so we can probably leave it alone.
Flags: needinfo?(mrbkap)
Assignee: blassey.bugs → wmccloskey
Depends on: 1274367
Heads up, bug 1270729 landing should have an impact on this.
Depends on: 1316646
You need to log in before you can comment on or make changes to this bug.