Closed Bug 1120650 Opened 10 years ago Closed 10 years ago

Count the number of times the Slow Script Dialog is shown during a session

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: vladan, Assigned: avih)

References

Details

Attachments

(1 file, 3 obsolete files)

To better evaluate e10s vs non-e10s performance on user desktops, we should report the number of times Firefox shows a "slow script" dialog during a session.
Summary: Count the number of times the Slow Script Dialog pops up during a session → Count the number of times the Slow Script Dialog is shown during a session
1. Slow script detected != slow script dialog.

There are several cases where the dialog is not displayed due to constraints, preferences, etc. I'm guessing we want detected rather than displayed, right?

2. This suffers from the same issue which we had with activeTicks - count histograms are the best fit for this, both semantically and from API availability point of view, but the dashboard doesn't display them good enough, so we'll want simple measurement - but those are not easily accessible from nsGlobalWindow.cpp (where the slow script thingy happen).

I think we should make count histograms reported automatically also as simple measurements. E.g. count.<name> or count-<name> etc. This way the Telemetry API can be used properly and semantics will be kept correctly.
Flags: needinfo?(vdjeric)
We'll discuss this in person this week and document the decision here
Flags: needinfo?(vdjeric)
(In reply to Vladan Djeric (:vladan) from comment #2)
> We'll discuss this in person this week and document the decision here

We've decided to use a count histogram, and fix the telemetry dashboard.

Counting at sGlobalWindow::ShowSlowScriptDialog() works fine on non-e10s.

However, on e10s, while right now the accumulation doesn't show on about:telemetry, I still debugged it to check if this code path is being entered.

So it is entered on e10s as well, but it seems that it's entered repeatedly every second until the user chooses an action (via the yellow alert bar at the top rather than a "proper" dialog like it does for non-e10s).

I looked around at the code, and I'm not sure what would be the best place to add the counting - such that it will count only once also on e10s (once the telemetry/e10s thingy gets fixed and becomes useful).
Bill, considering comment 3, and seeing that you landed the e10s handling of slow scripts at bug 1118618, would you know how to count slow-script detection/prompt only once such that it works for both e10s and non-e10s?
Flags: needinfo?(wmccloskey)
Unfortunately there isn't one single place where we could handle e10s and non-e10s.

In e10s, you could stick the telemetry here:
http://mxr.mozilla.org/mozilla-central/source/browser/modules/ProcessHangMonitor.jsm#281
That should hit approximately once per hang. The exception is if the user closes the notification box while the hang is still in progress. Then it will be hit again. But that happens with the slow script dialog too. Note that you'll also be notified about plugin hangs here. You can check report.hangType == report.SLOW_SCRIPT to only count slow scripts.

For non-e10s, as you know, you'll want to instrument here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#11006
Flags: needinfo?(wmccloskey)
Attached patch bug1120650.patch (obsolete) — Splinter Review
I think this is as good as we can get without having a single place to probe it.
Attachment #8557654 - Flags: review?(wmccloskey)
Attachment #8557654 - Flags: review?(vdjeric)
Comment on attachment 8557654 [details] [diff] [review]
bug1120650.patch

Review of attachment 8557654 [details] [diff] [review]:
-----------------------------------------------------------------

- I think the e10s slow script dialog is buggy, oftentimes it keeps reappearing in rapid succession on the same page regardless of which option I choose.
- We'll have to ignore e10s results from this probe until it's fixed. Bill, is there a bug for fixing the slow-script dialog?

::: dom/base/nsGlobalWindow.cpp
@@ +11020,5 @@
> +  } else {
> +    // - non-e10s.
> +    // On non-e10s Firefox - we're entering this function once per slow script,
> +    // so we can telemetry-probe it here.
> +    // On e10s - this function is entered in a loop, so don't probe here.

You're saying we should ignore this histogram when it's reported by the content process on E10S?

@@ +11022,5 @@
> +    // On non-e10s Firefox - we're entering this function once per slow script,
> +    // so we can telemetry-probe it here.
> +    // On e10s - this function is entered in a loop, so don't probe here.
> +    // Instead, for e10s we probe once at ProcessHangsMonitor.jsm
> +    Telemetry::Accumulate(Telemetry::SLOW_SCRIPT_NOTICE_COUNT, true);

the last parameter is a uint32_t, so pass 1 to Accumulate instead

::: toolkit/components/telemetry/Histograms.json
@@ +7308,5 @@
>      "description": "The number of Sync 1.5 migrations completed by Android Sync."
> +  },
> +  "SLOW_SCRIPT_NOTICE_COUNT": {
> +    "alert_emails": ["perf-telemetry-alerts@mozilla.com"],
> +    "expires_in_version": "default",

"default" means "this histogram was created before we had the 'expires_in_version' field". It's essentially a "TODO". Use "never" instead
Attachment #8557654 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) from comment #7)
> - I think the e10s slow script dialog is buggy, oftentimes it keeps
> reappearing in rapid succession on the same page regardless of which option
> I choose.
> - We'll have to ignore e10s results from this probe until it's fixed. Bill,
> is there a bug for fixing the slow-script dialog?

I'm not aware of any bugs. If you see something, please file!
Comment on attachment 8557654 [details] [diff] [review]
bug1120650.patch

Review of attachment 8557654 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/modules/ProcessHangMonitor.jsm
@@ +278,5 @@
>        return;
>      }
>  
> +    // on e10s this counts slow-script notice only once.
> +    //(this path is not entered on non-e10s)

Please capitalize, use periods, and put one space after //. So:
// On e10s this counts the slow script notice only once.
// (This patch is not entered on non-e10s.)

::: dom/base/nsGlobalWindow.cpp
@@ +11016,5 @@
>      }
>  
>      return ContinueSlowScriptAndKeepNotifying;
> +
> +  } else {

Since we return in the branch above, the style rules say that there shouldn't be an else. Your code can just follow the |if| statment without any indentation.

@@ +11020,5 @@
> +  } else {
> +    // - non-e10s.
> +    // On non-e10s Firefox - we're entering this function once per slow script,
> +    // so we can telemetry-probe it here.
> +    // On e10s - this function is entered in a loop, so don't probe here.

In e10s we exit this function before we get a chance to update the histogram.
Attachment #8557654 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #8)
> I'm not aware of any bugs. If you see something, please file!

Sadly I can't reproduce this issue anymore with today's Nightly, but I'll keep an e10s profile running and file a bug if I can get it to happen again
Depends on: 1129174
Attached patch bug1120650.v2.patch (obsolete) — Splinter Review
Addressed comments, carrying r+.

Vladan, I'm f? you since you had a concern:

(In reply to Vladan Djeric (:vladan) from comment #7)
> ::: dom/base/nsGlobalWindow.cpp
> @@ +11020,5 @@
> > +  } else {
> > +    // - non-e10s.
> > +    // On non-e10s Firefox - we're entering this function once per slow script,
> > +    // so we can telemetry-probe it here.
> > +    // On e10s - this function is entered in a loop, so don't probe here.
> 
> You're saying we should ignore this histogram when it's reported by the
> content process on E10S?

I'm saying that I'm adding a sample to this histogram from two different places, where one of these places is only invoked on e10s, and the other is only invoked on non-e10s.
Attachment #8557654 - Attachment is obsolete: true
Attachment #8559828 - Flags: review+
Attachment #8559828 - Flags: feedback?(vdjeric)
(In reply to Avi Halachmi (:avih) from comment #11)
> I'm saying that I'm adding a sample to this histogram from two different
> places, where one of these places is only invoked on e10s, and the other is
> only invoked on non-e10s.

Ok.

One last question: in the e10s version, does this really count the number of times there's a long-running content script or would it also count any content process hang? e.g. the content process getting hung on plugin code (plugin hang)
Assignee: nobody → avihpit
Flags: needinfo?(avihpit)
(In reply to Vladan Djeric (:vladan) from comment #12)
> One last question: in the e10s version, does this really count the number of
> times there's a long-running content script or would it also count any
> content process hang? e.g. the content process getting hung on plugin code
> (plugin hang)

Oh yeah, we need to fix that. Avi, see comment 5. You need to check if report.hangType == report.SLOW_SCRIPT.
Avi: report the different hangTypes in e10s in separate histograms
(In reply to Bill McCloskey (:billm) from comment #13)
> Oh yeah, we need to fix that. Avi, see comment 5. You need to check if
> report.hangType == report.SLOW_SCRIPT.

Yup, sorry, missed that. Done now.

(In reply to Vladan Djeric (:vladan) from comment #14)
> Avi: report the different hangTypes in e10s in separate histograms

Added a new histogram PLUGIN_HANG_NOTICE_COUNT which is only sampled on e10s since apparently we have sufficient plugin-related telemetry probes on non-e10s (according to Vladan on IRC).

While I did test that the slow-script count works on e10s and non-e10s, I don't have an ready environment to test plugin hangs.

I'll land the patch after I'll verify that the plugin hangs count does indeed work in e10s.
Attachment #8559828 - Attachment is obsolete: true
Attachment #8559828 - Flags: feedback?(vdjeric)
Flags: needinfo?(avihpit)
Attachment #8559944 - Flags: review+
Attachment #8559944 - Flags: feedback?(wmccloskey)
Attachment #8559944 - Flags: feedback?(vdjeric)
Comment on attachment 8559944 [details] [diff] [review]
bug1120650.v3.patch - split to script/plugin notices on e10s

Review of attachment 8559944 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +7368,5 @@
> +  "PLUGIN_HANG_NOTICE_COUNT": {
> +    "alert_emails": ["perf-telemetry-alerts@mozilla.com"],
> +    "expires_in_version": "never",
> +    "kind": "count",
> +    "description": "Count plugin hang notices"

"Count plugin hang notices in e10s"
Attachment #8559944 - Flags: feedback?(vdjeric) → feedback+
(In reply to Avi Halachmi (:avih) from comment #15)
> While I did test that the slow-script count works on e10s and non-e10s, I
> don't have an ready environment to test plugin hangs.

I'm having hard time reproducing plugin hang notice on e10s in order to test that the new probe works.

I just built m-c on windows and installed latest flash. I tested with the testcase from bug 805591 comment 37, but it hanged the whole browser for ~30s I think and didn't pop a plugin hanged notice at any time.

On the same test case, I also tried to suspend either or both of the plugin-container processes (suggested by aklotz) but the hang notice didn't appear.

I'm a bit scared to just try a flash gaming site, since there's a wild 0day flash exploit which apparently hit about 1800 domains and wasn't patched yet with the latest flash from adobe.

Bill, how should I test that the new plugin-hangs probe works as expected before I land this patch?
Flags: needinfo?(wmccloskey)
Attachment #8559944 - Flags: feedback?(wmccloskey) → feedback+
(In reply to Avi Halachmi (:avih) from comment #17)
> (In reply to Avi Halachmi (:avih) from comment #15)
> > While I did test that the slow-script count works on e10s and non-e10s, I
> > don't have an ready environment to test plugin hangs.
> 
> I'm having hard time reproducing plugin hang notice on e10s in order to test
> that the new probe works.
> 
> I just built m-c on windows and installed latest flash. I tested with the
> testcase from bug 805591 comment 37, but it hanged the whole browser for
> ~30s I think and didn't pop a plugin hanged notice at any time.

You can change dom.ipc.plugins.contentTimeoutSecs. It defaults to 45 seconds.

> Bill, how should I test that the new plugin-hangs probe works as expected
> before I land this patch?

As part of the build process we compile ./dist/plugins/libnptest.so (perhaps the name is platform-dependent). If you copy this into the plugins dir in your test profile, you can then use some HTML like this to hang:

<html>
<body onload="load()">
  <embed id="plugin1" type="application/x-test" width="400" height="400" drawmode="solid" color="FF00FFFF">

  <script>
function load() {
  var p = document.getElementById("plugin1");
  p.setColor("FF0000FF");
  p.hang();
}
</script>
</body>
</html>
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #18)
> You can change dom.ipc.plugins.contentTimeoutSecs. It defaults to 45 seconds.

Yeah, I was expecting ~10s like it does for the slow script dialog, so I aborted many of my experiments with suspending processes etc befoew 45s elapsed. Except just few minutes ago where I left the browser hanged in the background, and then some time later I noticed the message.

However, for that single notice - the patch counted 2 plugin-hanged notices.

billm just told me on IRC that this can indeed happen and it's probably a bug at the plugin hang UI, so we can go ahead and land this patch (after updating the description like vlad asked).

I'll do that tomorrow.
https://hg.mozilla.org/mozilla-central/rev/7b2845351442
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: