Closed Bug 710431 Opened 13 years ago Closed 12 years ago

Telemetry for browser window open

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 14

People

(Reporter: adw, Assigned: enndeakin)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 11 obsolete files)

Telemetry for browser window open and close.
Attached patch patch (obsolete) — Splinter Review
This is dangerously misleading, especially for window closings, since there's no way to plot the number of tabs in a window when it's opened or closed.  For reference, Jason asked about two-dimensional data here:

http://blog.mozilla.com/tglek/2011/06/22/developers-how-to-submit-telemetry-data/
Attachment #582459 - Flags: review?(gavin.sharp)
Do we even have reason to believe that new windows sometimes open slowly? It always seemed rather snappy to me even on my old netbook.

As for closing, what exactly do you want to measure? The window visibly disappearing or the window being destroyed? I think we mostly care about the former, and again I don't really anticipate a problem there.
(In reply to Dão Gottwald [:dao] from comment #2)
> Do we even have reason to believe that new windows sometimes open slowly?

I don't know, maybe whoever is behind this drive to telemeter everything does.  I presume there's such a person.  Maybe we can find him and ask him what it is he wants to measure exactly.  Maybe he'd tell us we're supposed to figure it out.

But if the hypothesis is "windows always open and close snappily," it seems to me there's only upside to testing it.  (Of course we can multiply that hypothesis by various definitions of "open" and "close"...)

> As for closing, what exactly do you want to measure?

What seems to me important to measure is the time from when the user initiates the close to the first point after the window disappears that Firefox is responsive.
(In reply to Drew Willcoxon :adw from comment #3)
> (In reply to Dão Gottwald [:dao] from comment #2)
> > Do we even have reason to believe that new windows sometimes open slowly?
> 
> I don't know, maybe whoever is behind this drive to telemeter everything
> does.  I presume there's such a person.  Maybe we can find him and ask him
> what it is he wants to measure exactly.  Maybe he'd tell us we're supposed
> to figure it out.

Well, we can't measure everything, so we may as well stop here.

> But if the hypothesis is "windows always open and close snappily," it seems
> to me there's only upside to testing it.  (Of course we can multiply that
> hypothesis by various definitions of "open" and "close"...)

One downside is code complexity, as this seems to be a rather complex patch. Another downside would be measuring something wrongly and artificially creating a problem where there's none.

> > As for closing, what exactly do you want to measure?
> 
> What seems to me important to measure is the time from when the user
> initiates the close to the first point after the window disappears that
> Firefox is responsive.

See, I don't think your patch is measuring this, as the window should be destroyed asynchronously.
Comment on attachment 582459 [details] [diff] [review]
patch

>@@ -3628,16 +3630,43 @@ function OpenBrowserWindow()

>+  // window-close telemetry
>+  //
>+  // The "close" event is also telemetered (see WindowIsClosing), but it's not
>+  // fired when the final tab in a window is closed.
>+  win.addEventListener("DOMWindowClose", function onClose(event) {
>+    if (event.target != win)
>+      return;
>+    win.removeEventListener("DOMWindowClose", onClose, false);
>+    telemeterWindowClose(win);
>+  }, false);

This will keep the window where this code runs alive when it's closed before 'win'.
(In reply to Dão Gottwald [:dao] from comment #4)
> Well, we can't measure everything, so we may as well stop here.

What?

> See, I don't think your patch is measuring this

It's not.
(In reply to Drew Willcoxon :adw from comment #6)
> (In reply to Dão Gottwald [:dao] from comment #4)
> > Well, we can't measure everything, so we may as well stop here.
> 
> What?

More precisely, we could measure everything but it would make our code base unmanagable. We need to decide what measurements make sense.
Sure.  The only reason I'm working on this is that Dietrich asked me to.
Comment on attachment 582459 [details] [diff] [review]
patch

Rather than changing the various places we open windows, could we simplify this by just using a JS module that uses nsIWindowWatcher.registerNotification to record _all_ window opens, add the load/paint handlers, and then record data accordingly based on window type?
Comment on attachment 582459 [details] [diff] [review]
patch

Dao's right about this entraining windows for too long. Also I wouldn't expect telemeterWindowClose(window) to work very well, since it would be attempting to run JS code on a window after it's been destroyed (or keep the window alive indefinitely, I'm not sure which).
Attachment #582459 - Flags: review?(gavin.sharp) → review-
Attached patch patch (obsolete) — Splinter Review
Attachment #582459 - Attachment is obsolete: true
Attachment #583355 - Flags: review?(gavin.sharp)
Dao, we're measuring this because it's one of the things that some people do complain about (though, it could be due to tab or page loading things, just making it appear to be window). Your experience is not necessarily representative of a typical user's experience. Also, we want to measure all of a set basic primary UI actions to first figure out what is or is not a problem in the wild, and how much the reality differs from our assumptions about these things.

If you have more questions about what we're measuring and why, come join the Snappy meetings on Thursdays, or ask on IRC in #perf. https://etherpad.mozilla.org/snappy
(In reply to Dietrich Ayala (:dietrich) from comment #12)
> Dao, we're measuring this because it's one of the things that some people do
> complain about (though, it could be due to tab or page loading things, just
> making it appear to be window)

What exactly do they complain about?
I have no idea why this patch still observes outer-window-destroyed. Why is this relevant for what people allegedly complain about?
Attachment #583355 - Flags: review-
> Your experience is not necessarily representative of a typical user's experience.

I should clarify. This isn't about what I do or do not experience myself. I have heard people complain about shutdown being slow (bug 662444). I have heard people complain about closing tabs being slow (bug 708673). I haven't heard people complain about closing windows being slow, which matches my expectation that (1) windows should disappear instantaneously even if (2) it can take a significant amount of time until they're destroyed. (2) would be bad if user interaction with another window was blocked throughout this time span, but it's my understanding that this isn't the case. If we have doubts about (1) then let's measure this, although it would be good to see at least a tiny piece of circumstantial evidence backing up these doubts.
Attached patch patch (obsolete) — Splinter Review
Attachment #583355 - Attachment is obsolete: true
Attachment #583355 - Flags: review?(gavin.sharp)
Attachment #583583 - Flags: review?(gavin.sharp)
Comment on attachment 583583 [details] [diff] [review]
patch

This appears to be exactly the same patch as the previous one.
Attachment #583583 - Flags: review-
Attached patch patch (obsolete) — Splinter Review
Attachment #583583 - Attachment is obsolete: true
Attachment #583583 - Flags: review?(gavin.sharp)
Attachment #583601 - Flags: review?(gavin.sharp)
Erm, are you doing this on purpose?
Comment on attachment 583601 [details] [diff] [review]
patch

I think you're generating bad and potentially misleading data here. Please say why you think you're generating good data. Until then my r- is going to stick.
Attachment #583601 - Flags: review-
Attached patch patch (obsolete) — Splinter Review
Attachment #583601 - Attachment is obsolete: true
Attachment #583601 - Flags: review?(gavin.sharp)
Attachment #583607 - Flags: review?(gavin.sharp)
Drew, you can't hide or rewrite history. My r- is still there.
We're measuring window close exactly because we don't know for sure if it's a problem or not. I've experienced slow window close, but like I said before, that doesn't matter at all - we need data from the wild.

We've been consistently wrong (completely, absolutely and totally wrong in some cases) about where we think performance problems exist and why. We're not measuring everything. The list of what to measure came as a result of the Product team talking with SUMO and others (see https://etherpad.mozilla.org/desktop-responsiveness), and from talking to the Performance team.

If the measurement yields no interesting information, we can remove it.

WRT to comment 15: Yes, windows should close without user-perceptible lag. If you disagree, please come to the Snappy meetings - it'd be great to have some more input about what the criteria for reasonable user action responsiveness should be.
We absolutely do need to be careful about how we interpret the data we're gathering - no question. But in a lot of cases, we can't fully make the "good data" vs. "bad data" determination before we've actually collected it, and seen whether the results are interesting. The fact that window destruction time doesn't necessarily correlate to a user-perceivable effects ("window close lag") doesn't mean that it isn't useful to measure. This kind of data could help us detect whether windows are being leaked, for example.
Looking at the patch, I think it would be best to avoid adding a separate outer-window-destroyed listener for each window, and instead track ID+times separately and use a single listener. It might be best at this point to just spin that portion off to a separate bug.
(In reply to Dietrich Ayala (:dietrich) from comment #23)
> We're measuring window close exactly because we don't know for sure if it's
> a problem or not. I've experienced slow window close, but like I said
> before, that doesn't matter at all - we need data from the wild.

What exactly is it you saw? This is critical for measuring it! I don't understand why you say it doesn't matter.

(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #24)
> We absolutely do need to be careful about how we interpret the data we're
> gathering - no question. But in a lot of cases, we can't fully make the
> "good data" vs. "bad data" determination before we've actually collected it,
> and seen whether the results are interesting. The fact that window
> destruction time doesn't necessarily correlate to a user-perceivable effects
> ("window close lag")

"Not necessarily" is an understatement. As far as I can tell, we know it measures something different.

> doesn't mean that it isn't useful to measure. This kind
> of data could help us detect whether windows are being leaked, for example.

Then this bug and the histogram name and description should say so. It's not just that the data is potentially uninteresting, but it can be misleading. Improving perceived window closing performance could mean to destroy windows more lazily and "regress" the numbers collected here.
Attachment #583607 - Attachment is obsolete: true
Attachment #583607 - Flags: review?(gavin.sharp)
Assignee: adw → nobody
Status: ASSIGNED → NEW
(In reply to Dão Gottwald [:dao] from comment #26)
> (In reply to Dietrich Ayala (:dietrich) from comment #23)
> > We're measuring window close exactly because we don't know for sure if it's
> > a problem or not. I've experienced slow window close, but like I said
> > before, that doesn't matter at all - we need data from the wild.
> 
> What exactly is it you saw? This is critical for measuring it! I don't
> understand why you say it doesn't matter.

Usually what happens is that I close the window using the keyboard shortcut, very briefly it flickers visually out of view (but definitely more than 50ms), and the File menu stays highlighted for a period of time.

This is irrelevant to some extent because it may not represent what other users experience, or the behavior on Windows or Linux, and maybe not on Mac either. The measurement should take into account the full breadth of user action -> window finished closing.
(In reply to Dietrich Ayala (:dietrich) from comment #27)
> Usually what happens is that I close the window using the keyboard shortcut,
> very briefly it flickers visually out of view (but definitely more than
> 50ms), and the File menu stays highlighted for a period of time.
> 
> This is irrelevant to some extent because it may not represent what other
> users experience, or the behavior on Windows or Linux, and maybe not on Mac
> either. The measurement should take into account the full breadth of user
> action -> window finished closing.

Sure, but if we have evidence for specific pain points, we can try pin them down directly. The lagged menu highlighting sounds like a Mac-specific quirk. I don't know to what extent we control that. Might be worth filing a Core/Widget: Cocoa bug on. The window flicker could actually fall into the same category.
(In reply to Dietrich Ayala (:dietrich) from comment #27)
> The measurement should take into account the full breadth of user
> action -> window finished closing.

Dao's point is that nsGlobalWindow destruction does not at all map to "window finished closing", from a user's point of view. (Arguably, the end of the "close" event handler probably doesn't map very well to "user action", either - but getting very precise measurements is going to be tricky, so in a lot of cases we need to settle for the best we can get.)

He's right about this, and it's good to be clear about what we're measuring ahead of time, lest there be any confusion in interpretation of the data down the line.
I have no idea about how to implement this, haven't even looked at drew's patch, so you don't have to sell me on what you're saying :)

My goal here is to measure how long it takes to close windows, from the user's perspective. Of course I'd like it to be done in a way that makes sense, and provide data that is useful.

Are you saying that it is just not possible?
Blocks: 671038
Narrowing this bug to window open, since there's no agreement on whether we should measure window close. Neil is going to take a look at it.
Assignee: nobody → enndeakin
Summary: Telemetry for browser window open and close → Telemetry for browser window open
This patch starts the timer when the new window command is issued, and uses the first document-shown notification as the end time. The sequence of new window opening is that after the native widget is shown, then painted, this notification is fired. Testing shows that it fires before the content document is loaded and painted.
Attachment #597032 - Flags: review?(dietrich)
Comment on attachment 597032 [details] [diff] [review]
Patch to add new window telemetry

>+      Services.obs.removeObserver(newDocumentShown, "document-shown", false);

removeObserver takes two arguments only.
Comment on attachment 597032 [details] [diff] [review]
Patch to add new window telemetry

Review of attachment 597032 [details] [diff] [review]:
-----------------------------------------------------------------

looks good, just fix these and Dao's comments.

::: browser/base/content/browser.js
@@ +3687,5 @@
> +      Services.obs.removeObserver(newDocumentShown, "document-shown", false);
> +      histogram.add(Date.now() - startTime);
> +    }
> +  };
> +  Services.obs.addObserver(newDocumentShown, "document-shown", false);

use TelemetryStopwatch instead:

http://felipe.wordpress.com/2012/02/13/telemetry-stopwatch/

::: toolkit/components/telemetry/TelemetryHistograms.h
@@ +325,5 @@
>  HISTOGRAM_BOOLEAN(FX_KEYWORD_URL_USERSET, "Firefox: keyword.URL has a user-set value")
>  HISTOGRAM(FX_IDENTITY_POPUP_OPEN_MS, 1, 1000, 10, EXPONENTIAL, "Firefox: Time taken by the identity popup to open in milliseconds")
>  HISTOGRAM(FX_APP_MENU_OPEN_MS, 1, 1000, 10, EXPONENTIAL, "Firefox: Time taken by the app-menu opening in milliseconds")
>  HISTOGRAM(FX_BOOKMARKS_TOOLBAR_INIT_MS, 50, 5000, 10, EXPONENTIAL, "Firefox: Time to initialize the bookmarks toolbar view (ms)")
> +HISTOGRAM(FX_NEW_WINDOW, 1, 10000, 20, EXPONENTIAL, "Firefox: Time taken to open a new browser window")

add the unit (ms) to the description
Attachment #597032 - Flags: review?(dietrich) → feedback+
Attached patch updated patch (obsolete) — Splinter Review
Attachment #597032 - Attachment is obsolete: true
Attachment #602019 - Flags: review?(dao)
Comment on attachment 602019 [details] [diff] [review]
updated patch

Review of attachment 602019 [details] [diff] [review]:
-----------------------------------------------------------------

bah, missed this in the last pass: please include the units in the key name as well, eg: + "_MS"
Attached patch updated patch 2 (obsolete) — Splinter Review
Attachment #602019 - Attachment is obsolete: true
Attachment #602019 - Flags: review?(dao)
Attachment #602029 - Flags: review?(dao)
Comment on attachment 602029 [details] [diff] [review]
updated patch 2

>+Cu.import("resource:///modules/TelemetryStopwatch.jsm");

This can be done lazily using XPCOMUtils.defineLazyModuleGetter (and should be moved down to the other lazily loaded modules).

> function OpenBrowserWindow()
> {
>+  TelemetryStopwatch.start("FX_NEW_WINDOW_MS");
>+  var histogram = Services.telemetry.getHistogramById("FX_NEW_WINDOW_MS");

'histogram' is unused

>+  function newDocumentShown(doc, topic, data) {
>+    if (topic == "document-shown" && doc != document &&
>+        doc.location == "chrome://browser/content/browser.xul") {
>+      Services.obs.removeObserver(newDocumentShown, "document-shown");
>+      TelemetryStopwatch.finish("FX_NEW_WINDOW_MS");
>+    }
>+  };

What happens when opening two windows simultaneously? It looks like TelemetryStopwatch.start will refuse to operate in that case. A saner API would probably allow both calls and return handles so that both can be finished independently...
> What happens when opening two windows simultaneously? It looks like
> TelemetryStopwatch.start will refuse to operate in that case. A saner API
> would probably allow both calls and return handles so that both can be
> finished independently...

Apparently TelemetryStopwatch.start accepts a second argument and will use it as something like a handle. So I guess you should create a dummy object for this.
Attached patch updated patch 3 (obsolete) — Splinter Review
Attachment #602029 - Attachment is obsolete: true
Attachment #602029 - Flags: review?(dao)
Attachment #602993 - Flags: review?(dao)
You can use defineLazyModuleGetter for the lazy getter instead of rolling your own, fwiw.
Attached patch address gavin's comment (obsolete) — Splinter Review
Attachment #602993 - Attachment is obsolete: true
Attachment #602993 - Flags: review?(dao)
Attachment #603003 - Flags: review?(dao)
Comment on attachment 603003 [details] [diff] [review]
address gavin's comment

> function OpenBrowserWindow()
> {
>+  var telemetryObj = {};
>+  TelemetryStopwatch.start("FX_NEW_WINDOW_MS", telemetryObj);
>+
>+  function newDocumentShown(doc, topic, data) {
>+    if (topic == "document-shown" && doc != document &&

nit: new line after the first &&

>+        doc.location == "chrome://browser/content/browser.xul" &&
>+        doc.defaultView == telemetryObj.window) {

This can be just doc.defaultView == win

>+  telemetryObj.window = win;

And this can be removed.
Attachment #603003 - Flags: review?(dao) → review+
Attached patch updated patch (obsolete) — Splinter Review
For some reason some windows cause an error when trying to access the location, but I don't think the location check is needed.

Bug 732564 wanted to remove the document-shown notification, but I'm not sure if there's something that can be used instead.
Attachment #603003 - Attachment is obsolete: true
Attachment #604409 - Attachment is obsolete: true
Attachment #605752 - Flags: review?
Attachment #605752 - Flags: review? → review?(dao)
Attachment #605752 - Flags: review?(dao) → review+
https://hg.mozilla.org/mozilla-central/rev/07ed71d1e7ab
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 14
So should this actually work? I don't see FX_NEW_WINDOW_MS in Telemetry Evolution.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: