Last Comment Bug 710431 - Telemetry for browser window open
: Telemetry for browser window open
Status: RESOLVED FIXED
:
Product: Firefox
Classification: Client Software
Component: General (show other bugs)
: Trunk
: All All
: -- normal (vote)
: Firefox 14
Assigned To: Neil Deakin
:
Mentors:
Depends on:
Blocks: 671038 671039
  Show dependency treegraph
 
Reported: 2011-12-13 15:08 PST by Drew Willcoxon :adw
Modified: 2012-07-16 13:58 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (16.10 KB, patch)
2011-12-16 18:33 PST, Drew Willcoxon :adw
gavin.sharp: review-
Details | Diff | Review
patch (6.47 KB, patch)
2011-12-20 17:50 PST, Drew Willcoxon :adw
dao+bmo: review-
Details | Diff | Review
patch (6.47 KB, patch)
2011-12-21 12:26 PST, Drew Willcoxon :adw
dao+bmo: review-
Details | Diff | Review
patch (6.47 KB, patch)
2011-12-21 13:11 PST, Drew Willcoxon :adw
dao+bmo: review-
Details | Diff | Review
patch (6.47 KB, patch)
2011-12-21 13:19 PST, Drew Willcoxon :adw
no flags Details | Diff | Review
Patch to add new window telemetry (3.03 KB, patch)
2012-02-14 07:50 PST, Neil Deakin
dietrich: feedback+
Details | Diff | Review
updated patch (3.50 KB, patch)
2012-03-01 10:38 PST, Neil Deakin
no flags Details | Diff | Review
updated patch 2 (3.52 KB, patch)
2012-03-01 10:50 PST, Neil Deakin
no flags Details | Diff | Review
updated patch 3 (4.35 KB, patch)
2012-03-05 11:45 PST, Neil Deakin
no flags Details | Diff | Review
address gavin's comment (4.28 KB, patch)
2012-03-05 12:01 PST, Neil Deakin
dao+bmo: review+
Details | Diff | Review
updated patch (3.72 KB, patch)
2012-03-09 07:21 PST, Neil Deakin
no flags Details | Diff | Review
Remove line that causes an error (3.66 KB, patch)
2012-03-14 07:53 PDT, Neil Deakin
dao+bmo: review+
Details | Diff | Review

Description Drew Willcoxon :adw 2011-12-13 15:08:30 PST
Telemetry for browser window open and close.
Comment 1 Drew Willcoxon :adw 2011-12-16 18:33:55 PST
Created attachment 582459 [details] [diff] [review]
patch

This is dangerously misleading, especially for window closings, since there's no way to plot the number of tabs in a window when it's opened or closed.  For reference, Jason asked about two-dimensional data here:

http://blog.mozilla.com/tglek/2011/06/22/developers-how-to-submit-telemetry-data/
Comment 2 Dão Gottwald [:dao] 2011-12-17 06:33:33 PST
Do we even have reason to believe that new windows sometimes open slowly? It always seemed rather snappy to me even on my old netbook.

As for closing, what exactly do you want to measure? The window visibly disappearing or the window being destroyed? I think we mostly care about the former, and again I don't really anticipate a problem there.
Comment 3 Drew Willcoxon :adw 2011-12-18 20:34:54 PST
(In reply to Dão Gottwald [:dao] from comment #2)
> Do we even have reason to believe that new windows sometimes open slowly?

I don't know, maybe whoever is behind this drive to telemeter everything does.  I presume there's such a person.  Maybe we can find him and ask him what it is he wants to measure exactly.  Maybe he'd tell us we're supposed to figure it out.

But if the hypothesis is "windows always open and close snappily," it seems to me there's only upside to testing it.  (Of course we can multiply that hypothesis by various definitions of "open" and "close"...)

> As for closing, what exactly do you want to measure?

What seems to me important to measure is the time from when the user initiates the close to the first point after the window disappears that Firefox is responsive.
Comment 4 Dão Gottwald [:dao] 2011-12-19 02:02:00 PST
(In reply to Drew Willcoxon :adw from comment #3)
> (In reply to Dão Gottwald [:dao] from comment #2)
> > Do we even have reason to believe that new windows sometimes open slowly?
> 
> I don't know, maybe whoever is behind this drive to telemeter everything
> does.  I presume there's such a person.  Maybe we can find him and ask him
> what it is he wants to measure exactly.  Maybe he'd tell us we're supposed
> to figure it out.

Well, we can't measure everything, so we may as well stop here.

> But if the hypothesis is "windows always open and close snappily," it seems
> to me there's only upside to testing it.  (Of course we can multiply that
> hypothesis by various definitions of "open" and "close"...)

One downside is code complexity, as this seems to be a rather complex patch. Another downside would be measuring something wrongly and artificially creating a problem where there's none.

> > As for closing, what exactly do you want to measure?
> 
> What seems to me important to measure is the time from when the user
> initiates the close to the first point after the window disappears that
> Firefox is responsive.

See, I don't think your patch is measuring this, as the window should be destroyed asynchronously.
Comment 5 Dão Gottwald [:dao] 2011-12-19 02:05:45 PST
Comment on attachment 582459 [details] [diff] [review]
patch

>@@ -3628,16 +3630,43 @@ function OpenBrowserWindow()

>+  // window-close telemetry
>+  //
>+  // The "close" event is also telemetered (see WindowIsClosing), but it's not
>+  // fired when the final tab in a window is closed.
>+  win.addEventListener("DOMWindowClose", function onClose(event) {
>+    if (event.target != win)
>+      return;
>+    win.removeEventListener("DOMWindowClose", onClose, false);
>+    telemeterWindowClose(win);
>+  }, false);

This will keep the window where this code runs alive when it's closed before 'win'.
Comment 6 Drew Willcoxon :adw 2011-12-19 11:31:35 PST
(In reply to Dão Gottwald [:dao] from comment #4)
> Well, we can't measure everything, so we may as well stop here.

What?

> See, I don't think your patch is measuring this

It's not.
Comment 7 Dão Gottwald [:dao] 2011-12-19 12:32:06 PST
(In reply to Drew Willcoxon :adw from comment #6)
> (In reply to Dão Gottwald [:dao] from comment #4)
> > Well, we can't measure everything, so we may as well stop here.
> 
> What?

More precisely, we could measure everything but it would make our code base unmanagable. We need to decide what measurements make sense.
Comment 8 Drew Willcoxon :adw 2011-12-19 13:02:20 PST
Sure.  The only reason I'm working on this is that Dietrich asked me to.
Comment 9 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-12-19 19:10:03 PST
Comment on attachment 582459 [details] [diff] [review]
patch

Rather than changing the various places we open windows, could we simplify this by just using a JS module that uses nsIWindowWatcher.registerNotification to record _all_ window opens, add the load/paint handlers, and then record data accordingly based on window type?
Comment 10 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-12-19 19:16:57 PST
Comment on attachment 582459 [details] [diff] [review]
patch

Dao's right about this entraining windows for too long. Also I wouldn't expect telemeterWindowClose(window) to work very well, since it would be attempting to run JS code on a window after it's been destroyed (or keep the window alive indefinitely, I'm not sure which).
Comment 11 Drew Willcoxon :adw 2011-12-20 17:50:10 PST
Created attachment 583355 [details] [diff] [review]
patch
Comment 12 Dietrich Ayala (:dietrich) 2011-12-20 18:09:32 PST
Dao, we're measuring this because it's one of the things that some people do complain about (though, it could be due to tab or page loading things, just making it appear to be window). Your experience is not necessarily representative of a typical user's experience. Also, we want to measure all of a set basic primary UI actions to first figure out what is or is not a problem in the wild, and how much the reality differs from our assumptions about these things.

If you have more questions about what we're measuring and why, come join the Snappy meetings on Thursdays, or ask on IRC in #perf. https://etherpad.mozilla.org/snappy
Comment 13 Dão Gottwald [:dao] 2011-12-20 23:57:56 PST
(In reply to Dietrich Ayala (:dietrich) from comment #12)
> Dao, we're measuring this because it's one of the things that some people do
> complain about (though, it could be due to tab or page loading things, just
> making it appear to be window)

What exactly do they complain about?
Comment 14 Dão Gottwald [:dao] 2011-12-21 00:01:39 PST
I have no idea why this patch still observes outer-window-destroyed. Why is this relevant for what people allegedly complain about?
Comment 15 Dão Gottwald [:dao] 2011-12-21 01:54:41 PST
> Your experience is not necessarily representative of a typical user's experience.

I should clarify. This isn't about what I do or do not experience myself. I have heard people complain about shutdown being slow (bug 662444). I have heard people complain about closing tabs being slow (bug 708673). I haven't heard people complain about closing windows being slow, which matches my expectation that (1) windows should disappear instantaneously even if (2) it can take a significant amount of time until they're destroyed. (2) would be bad if user interaction with another window was blocked throughout this time span, but it's my understanding that this isn't the case. If we have doubts about (1) then let's measure this, although it would be good to see at least a tiny piece of circumstantial evidence backing up these doubts.
Comment 16 Drew Willcoxon :adw 2011-12-21 12:26:48 PST
Created attachment 583583 [details] [diff] [review]
patch
Comment 17 Dão Gottwald [:dao] 2011-12-21 13:09:36 PST
Comment on attachment 583583 [details] [diff] [review]
patch

This appears to be exactly the same patch as the previous one.
Comment 18 Drew Willcoxon :adw 2011-12-21 13:11:25 PST
Created attachment 583601 [details] [diff] [review]
patch
Comment 19 Dão Gottwald [:dao] 2011-12-21 13:15:03 PST
Erm, are you doing this on purpose?
Comment 20 Dão Gottwald [:dao] 2011-12-21 13:18:37 PST
Comment on attachment 583601 [details] [diff] [review]
patch

I think you're generating bad and potentially misleading data here. Please say why you think you're generating good data. Until then my r- is going to stick.
Comment 21 Drew Willcoxon :adw 2011-12-21 13:19:58 PST
Created attachment 583607 [details] [diff] [review]
patch
Comment 22 Dão Gottwald [:dao] 2011-12-21 13:22:12 PST
Drew, you can't hide or rewrite history. My r- is still there.
Comment 23 Dietrich Ayala (:dietrich) 2011-12-21 16:14:40 PST
We're measuring window close exactly because we don't know for sure if it's a problem or not. I've experienced slow window close, but like I said before, that doesn't matter at all - we need data from the wild.

We've been consistently wrong (completely, absolutely and totally wrong in some cases) about where we think performance problems exist and why. We're not measuring everything. The list of what to measure came as a result of the Product team talking with SUMO and others (see https://etherpad.mozilla.org/desktop-responsiveness), and from talking to the Performance team.

If the measurement yields no interesting information, we can remove it.

WRT to comment 15: Yes, windows should close without user-perceptible lag. If you disagree, please come to the Snappy meetings - it'd be great to have some more input about what the criteria for reasonable user action responsiveness should be.
Comment 24 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-12-21 16:57:35 PST
We absolutely do need to be careful about how we interpret the data we're gathering - no question. But in a lot of cases, we can't fully make the "good data" vs. "bad data" determination before we've actually collected it, and seen whether the results are interesting. The fact that window destruction time doesn't necessarily correlate to a user-perceivable effects ("window close lag") doesn't mean that it isn't useful to measure. This kind of data could help us detect whether windows are being leaked, for example.
Comment 25 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-12-21 16:59:13 PST
Looking at the patch, I think it would be best to avoid adding a separate outer-window-destroyed listener for each window, and instead track ID+times separately and use a single listener. It might be best at this point to just spin that portion off to a separate bug.
Comment 26 Dão Gottwald [:dao] 2011-12-22 01:36:31 PST
(In reply to Dietrich Ayala (:dietrich) from comment #23)
> We're measuring window close exactly because we don't know for sure if it's
> a problem or not. I've experienced slow window close, but like I said
> before, that doesn't matter at all - we need data from the wild.

What exactly is it you saw? This is critical for measuring it! I don't understand why you say it doesn't matter.

(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #24)
> We absolutely do need to be careful about how we interpret the data we're
> gathering - no question. But in a lot of cases, we can't fully make the
> "good data" vs. "bad data" determination before we've actually collected it,
> and seen whether the results are interesting. The fact that window
> destruction time doesn't necessarily correlate to a user-perceivable effects
> ("window close lag")

"Not necessarily" is an understatement. As far as I can tell, we know it measures something different.

> doesn't mean that it isn't useful to measure. This kind
> of data could help us detect whether windows are being leaked, for example.

Then this bug and the histogram name and description should say so. It's not just that the data is potentially uninteresting, but it can be misleading. Improving perceived window closing performance could mean to destroy windows more lazily and "regress" the numbers collected here.
Comment 27 Dietrich Ayala (:dietrich) 2011-12-22 10:04:51 PST
(In reply to Dão Gottwald [:dao] from comment #26)
> (In reply to Dietrich Ayala (:dietrich) from comment #23)
> > We're measuring window close exactly because we don't know for sure if it's
> > a problem or not. I've experienced slow window close, but like I said
> > before, that doesn't matter at all - we need data from the wild.
> 
> What exactly is it you saw? This is critical for measuring it! I don't
> understand why you say it doesn't matter.

Usually what happens is that I close the window using the keyboard shortcut, very briefly it flickers visually out of view (but definitely more than 50ms), and the File menu stays highlighted for a period of time.

This is irrelevant to some extent because it may not represent what other users experience, or the behavior on Windows or Linux, and maybe not on Mac either. The measurement should take into account the full breadth of user action -> window finished closing.
Comment 28 Dão Gottwald [:dao] 2011-12-22 11:10:44 PST
(In reply to Dietrich Ayala (:dietrich) from comment #27)
> Usually what happens is that I close the window using the keyboard shortcut,
> very briefly it flickers visually out of view (but definitely more than
> 50ms), and the File menu stays highlighted for a period of time.
> 
> This is irrelevant to some extent because it may not represent what other
> users experience, or the behavior on Windows or Linux, and maybe not on Mac
> either. The measurement should take into account the full breadth of user
> action -> window finished closing.

Sure, but if we have evidence for specific pain points, we can try pin them down directly. The lagged menu highlighting sounds like a Mac-specific quirk. I don't know to what extent we control that. Might be worth filing a Core/Widget: Cocoa bug on. The window flicker could actually fall into the same category.
Comment 29 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-12-22 11:24:12 PST
(In reply to Dietrich Ayala (:dietrich) from comment #27)
> The measurement should take into account the full breadth of user
> action -> window finished closing.

Dao's point is that nsGlobalWindow destruction does not at all map to "window finished closing", from a user's point of view. (Arguably, the end of the "close" event handler probably doesn't map very well to "user action", either - but getting very precise measurements is going to be tricky, so in a lot of cases we need to settle for the best we can get.)

He's right about this, and it's good to be clear about what we're measuring ahead of time, lest there be any confusion in interpretation of the data down the line.
Comment 30 Dietrich Ayala (:dietrich) 2011-12-22 11:51:18 PST
I have no idea about how to implement this, haven't even looked at drew's patch, so you don't have to sell me on what you're saying :)

My goal here is to measure how long it takes to close windows, from the user's perspective. Of course I'd like it to be done in a way that makes sense, and provide data that is useful.

Are you saying that it is just not possible?
Comment 31 Dietrich Ayala (:dietrich) 2012-02-01 06:49:17 PST
Narrowing this bug to window open, since there's no agreement on whether we should measure window close. Neil is going to take a look at it.
Comment 32 Neil Deakin 2012-02-14 07:50:07 PST
Created attachment 597032 [details] [diff] [review]
Patch to add new window telemetry

This patch starts the timer when the new window command is issued, and uses the first document-shown notification as the end time. The sequence of new window opening is that after the native widget is shown, then painted, this notification is fired. Testing shows that it fires before the content document is loaded and painted.
Comment 33 Dão Gottwald [:dao] 2012-02-28 06:49:07 PST
Comment on attachment 597032 [details] [diff] [review]
Patch to add new window telemetry

>+      Services.obs.removeObserver(newDocumentShown, "document-shown", false);

removeObserver takes two arguments only.
Comment 34 Dietrich Ayala (:dietrich) 2012-02-29 09:34:44 PST
Comment on attachment 597032 [details] [diff] [review]
Patch to add new window telemetry

Review of attachment 597032 [details] [diff] [review]:
-----------------------------------------------------------------

looks good, just fix these and Dao's comments.

::: browser/base/content/browser.js
@@ +3687,5 @@
> +      Services.obs.removeObserver(newDocumentShown, "document-shown", false);
> +      histogram.add(Date.now() - startTime);
> +    }
> +  };
> +  Services.obs.addObserver(newDocumentShown, "document-shown", false);

use TelemetryStopwatch instead:

http://felipe.wordpress.com/2012/02/13/telemetry-stopwatch/

::: toolkit/components/telemetry/TelemetryHistograms.h
@@ +325,5 @@
>  HISTOGRAM_BOOLEAN(FX_KEYWORD_URL_USERSET, "Firefox: keyword.URL has a user-set value")
>  HISTOGRAM(FX_IDENTITY_POPUP_OPEN_MS, 1, 1000, 10, EXPONENTIAL, "Firefox: Time taken by the identity popup to open in milliseconds")
>  HISTOGRAM(FX_APP_MENU_OPEN_MS, 1, 1000, 10, EXPONENTIAL, "Firefox: Time taken by the app-menu opening in milliseconds")
>  HISTOGRAM(FX_BOOKMARKS_TOOLBAR_INIT_MS, 50, 5000, 10, EXPONENTIAL, "Firefox: Time to initialize the bookmarks toolbar view (ms)")
> +HISTOGRAM(FX_NEW_WINDOW, 1, 10000, 20, EXPONENTIAL, "Firefox: Time taken to open a new browser window")

add the unit (ms) to the description
Comment 35 Neil Deakin 2012-03-01 10:38:57 PST
Created attachment 602019 [details] [diff] [review]
updated patch
Comment 36 Dietrich Ayala (:dietrich) 2012-03-01 10:43:31 PST
Comment on attachment 602019 [details] [diff] [review]
updated patch

Review of attachment 602019 [details] [diff] [review]:
-----------------------------------------------------------------

bah, missed this in the last pass: please include the units in the key name as well, eg: + "_MS"
Comment 37 Neil Deakin 2012-03-01 10:50:56 PST
Created attachment 602029 [details] [diff] [review]
updated patch 2
Comment 38 Dão Gottwald [:dao] 2012-03-02 09:11:00 PST
Comment on attachment 602029 [details] [diff] [review]
updated patch 2

>+Cu.import("resource:///modules/TelemetryStopwatch.jsm");

This can be done lazily using XPCOMUtils.defineLazyModuleGetter (and should be moved down to the other lazily loaded modules).

> function OpenBrowserWindow()
> {
>+  TelemetryStopwatch.start("FX_NEW_WINDOW_MS");
>+  var histogram = Services.telemetry.getHistogramById("FX_NEW_WINDOW_MS");

'histogram' is unused

>+  function newDocumentShown(doc, topic, data) {
>+    if (topic == "document-shown" && doc != document &&
>+        doc.location == "chrome://browser/content/browser.xul") {
>+      Services.obs.removeObserver(newDocumentShown, "document-shown");
>+      TelemetryStopwatch.finish("FX_NEW_WINDOW_MS");
>+    }
>+  };

What happens when opening two windows simultaneously? It looks like TelemetryStopwatch.start will refuse to operate in that case. A saner API would probably allow both calls and return handles so that both can be finished independently...
Comment 39 Dão Gottwald [:dao] 2012-03-02 09:15:54 PST
> What happens when opening two windows simultaneously? It looks like
> TelemetryStopwatch.start will refuse to operate in that case. A saner API
> would probably allow both calls and return handles so that both can be
> finished independently...

Apparently TelemetryStopwatch.start accepts a second argument and will use it as something like a handle. So I guess you should create a dummy object for this.
Comment 40 Neil Deakin 2012-03-05 11:45:18 PST
Created attachment 602993 [details] [diff] [review]
updated patch 3
Comment 41 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-03-05 11:48:08 PST
You can use defineLazyModuleGetter for the lazy getter instead of rolling your own, fwiw.
Comment 42 Neil Deakin 2012-03-05 12:01:16 PST
Created attachment 603003 [details] [diff] [review]
address gavin's comment
Comment 43 Dão Gottwald [:dao] 2012-03-06 08:04:50 PST
Comment on attachment 603003 [details] [diff] [review]
address gavin's comment

> function OpenBrowserWindow()
> {
>+  var telemetryObj = {};
>+  TelemetryStopwatch.start("FX_NEW_WINDOW_MS", telemetryObj);
>+
>+  function newDocumentShown(doc, topic, data) {
>+    if (topic == "document-shown" && doc != document &&

nit: new line after the first &&

>+        doc.location == "chrome://browser/content/browser.xul" &&
>+        doc.defaultView == telemetryObj.window) {

This can be just doc.defaultView == win

>+  telemetryObj.window = win;

And this can be removed.
Comment 44 Neil Deakin 2012-03-09 07:21:51 PST
Created attachment 604409 [details] [diff] [review]
updated patch
Comment 45 Matt Brubeck (:mbrubeck) 2012-03-09 09:05:51 PST
Sorry, I had to back this out on inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6cb832275a45

because of mochitest-browser-chrome test failures:
https://tbpl.mozilla.org/php/getParsedLog.php?id=9943826&tree=Mozilla-Inbound
Comment 46 Neil Deakin 2012-03-14 07:53:16 PDT
Created attachment 605752 [details] [diff] [review]
Remove line that causes an error

For some reason some windows cause an error when trying to access the location, but I don't think the location check is needed.

Bug 732564 wanted to remove the document-shown notification, but I'm not sure if there's something that can be used instead.
Comment 47 Mounir Lamouri (:mounir) 2012-03-21 03:53:26 PDT
https://hg.mozilla.org/mozilla-central/rev/07ed71d1e7ab
Comment 48 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-07-16 13:58:05 PDT
So should this actually work? I don't see FX_NEW_WINDOW_MS in Telemetry Evolution.

Note You need to log in before you can comment on or make changes to this bug.