Closed Bug 1388159 Opened 8 years ago Closed 6 years ago

Improve the measurement in tpaint

Categories

(Testing :: Talos, enhancement, P2)

Version 3
enhancement

Tracking

(firefox67 fixed)

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: zbraniecki, Assigned: bdahl)

References

Details

Attachments

(2 files)

Currently, tpaint is measuring the wrong thing by taking `performance.now` in the callback to MozAfterPaint. The problem is that MozAfterPaint is called asynchronously some time after paint happens, with no guarantees as to when. In the long term, we'll want to switch tpaint to use timeToNonBlankPaint (bug 1381988), but first we'd like to fix when we capture that (bug 1388157). So, until then, I'd like to improve the tpaint by taking MozAfterPaint's event.paintTimeStamp, instead of `performance.now`. This is an easy fix, and should get us very close to the final number making the final patch not affect the numbers as drastically.
Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Comment on attachment 8894643 [details] Bug 1388159 - Improve the measurement in tpaint. https://reviewboard.mozilla.org/r/165808/#review170924 thanks, this is easy to follow.
Attachment #8894643 - Flags: review?(jmaher) → review+
aand that will not work, because MozAfterPaint's evt.paintTimeStamp is chrome-only :( I'm not sure if I know how to attempt to use any specialpowers or other tricks to get access to it, since both the tpaint runner and tpaint documents are content only. We may hit a roadblock here and either have to get paintTimeStamp to content or wait for it to be expressed via performance.timing. :jmaher - do you know who might be the right person to decide and potentially implement such a change? DOM code is way outside of my comfort zone.
Flags: needinfo?(jmaher)
we have talospowers available to use :) To answer your question, I don't know exactly who, I would start with :overholt. I guess if TalosPowers doesn't solve this current problem (which it might not since it would have data for chrome and probably not content)
Flags: needinfo?(jmaher)
Flags: needinfo?(gijskruitbosch+bugs)
I'm not sure I understand what the issue is here. tabpaint ( https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/tabpaint/content/target.html#49 ) looks to me like it's already using timeToNonBlankPaint. Has something just changed in the intervening 6 months? Or is tpaint doing things radically differently from tabpaint? Could we add the event listener from a message manager script instead of from the content, if that's the problem? It looks like target.html gets loaded over chrome:, and presumably the tpaint/ts_paint content document doesn't. Changing that would work, though we'd technically not be testing the http or file: overhead anymore (assuming that's currently how we load the document) - I'd be a bit surprised if that was significant though... Alternatively, we could use a content process script to detect the document object being created and attach the listener in there, and check the paint time for it. :gandalf, would you have cycles to drive this over the line given this info?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gandalf)
Tarek / :rwood, any chance of moving this bug along? It would be nice to make our measurements more reliable.
Flags: needinfo?(tarek)
Flags: needinfo?(rwood)
tpaint: https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/tpaint/tpaint.html opens a new window with a small bit of html/js to report it was opened: var kidHTML = "<html><meta charset='utf-8'><script>" + "var e = 'MozAfterPaint';" + "function done() {" + " window.removeEventListener(e, done, true);" + " window.opener.postMessage({msg: 'childIsOpen'}, '*');" + " window.close();" + "}" + "window.addEventListener(e, done, true);" + "</" + "script>TPAINT</html>"; is this something that we would want to listen for something besides MozAfterPaint such as: content.window.performance.timing.timeToNonBlankPaint would we need different content to be displayed in the new window?
Flags: needinfo?(tarek)
I'm not really sure what the state of this is or the desired action is...
Flags: needinfo?(rwood)
:gandalf is this bug still valid given there's been little activity in the last 10 months?
Priority: -- → P2

I started filing another bug about this same issue. In work on transitioning from browser.xul to browser.xhtml I've seen strange results where nearly all talos tests improve but tpaint moves in the opposite direction and I thinks it's because of the strange way we measure tpaint.

I'm happy to take over work on it, but I would like to clarify a few things. Do we want to measure the time it takes to open a new window and paint the 1) browser chrome, 2) the content, or 3) both?

Flags: needinfo?(jmaher)

My understanding is that we want to measure the browser chrome, so (1).

Flags: needinfo?(gandalf)

(In reply to Brendan Dahl [:bdahl] from comment #11)

I started filing another bug about this same issue. In work on transitioning from browser.xul to browser.xhtml I've seen strange results where nearly all talos tests improve but tpaint moves in the opposite direction and I thinks it's because of the strange way we measure tpaint.

I'm happy to take over work on it, but I would like to clarify a few things. Do we want to measure the time it takes to open a new window and paint the 1) browser chrome, 2) the content, or 3) both?

This seems like a "what do we want to measure" question. Vicky, do you or your team have an opinion about this? Specifically: do we care about the browser window itself rendering, or the content inside of it rendering when tracking the time it takes to open a new window?

Flags: needinfo?(vchin)

the intention of tpaint is painting a new window, there really isn't content to be displayed, so I vote for what :gandalf says, #1

Flags: needinfo?(jmaher)

IMHO, one of the other things that's a bit surprising about tpaint is that it uses window.open(), instead of OpenBrowserWindow(). That is, it's equivalent to a webpage calling window.open() with either such "flags" params that force a new window instead of new tab (e.g. height/width), or having turned off the default "open new windows as new tabs". It's not equivalent to using accel-n to just open a new browser window as a user. Then again, it might not be easy to change that given it'll change the baseline. Perhaps we want a new test, esp. if we also care about how quickly activity stream is displayed in the window that then opens...

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #14)

the intention of tpaint is painting a new window, there really isn't content to be displayed, so I vote for what :gandalf says, #1

We currently use the mozafterpaint in the child, which can now happen before or after the mozafterpaint in the parent, so we're really measuring content load time + the lag time of the message being received. I'm guessing that didn't use to be the case when everything was one process, but we should record the time of the parent process mozafterpaint (or first non blank paint) instead.

Assignee: gandalf → bdahl

(In reply to Brian Grinstead [:bgrins] from comment #13)

(In reply to Brendan Dahl [:bdahl] from comment #11)

I'm happy to take over work on it, but I would like to clarify a few things. Do we want to measure the time it takes to open a new window and paint the 1) browser chrome, 2) the content, or 3) both?

This seems like a "what do we want to measure" question. Vicky, do you or your team have an opinion about this? Specifically: do we care about the browser window itself rendering, or the content inside of it rendering when tracking the time it takes to open a new window?

Bas, Vicky and I discussed this bug this morning. We recommend the following:

  • The test should measure how long it takes for the user to open a browser window until the browser chrome (not including the content) first paints.
  • Please follow Gijs' suggestion from comment 15 and use OpenBrowserWindow() instead.
  • Listen for the MozAfterPaint event on the browser window in the parent process and record its paintTimeStamp. This is actually the compositeEnd timestamp, which is exactly what you want to measure.
  • Do not use timeToFirstNonBlankPaint. That one is designed for pageload and not for browser windows.
  • Independent from this, it might be a good idea to rename the test to twinopen because it's really about measuring window opening times and not generic paint times, and the current name seems to be confusing people.
Flags: needinfo?(vchin)

Listen for the MozAfterPaint event on the browser window in the parent process and record its paintTimeStamp. This is actually the compositeEnd timestamp, which is exactly what you want to measure.

Can you elaborate here? Why do we want to measure compositeEnd, and not paintEnd?
The way I understood the problem last time I investigated it was that we want paintEnd, but we can't get it because our OMTP doesn't provide the timestamp back.
I'm ok getting compositeEnd for now, but I'd like to understand if it's something we'd like to improve in the future or for some reason not.

I understand that W3C Paint Timing API [0] is for pages, but I hope that it's definition of paint applies to our chrome scenario as well, and it would make it easier to reason about our performance metrics if we aligned all timestamps to either compositeEnd or paintEnd unless there's a reason not to?

[0] https://w3c.github.io/paint-timing/#paint

Flags: needinfo?(mstange)

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #18)

Listen for the MozAfterPaint event on the browser window in the parent process and record its paintTimeStamp. This is actually the compositeEnd timestamp, which is exactly what you want to measure.

Can you elaborate here? Why do we want to measure compositeEnd, and not paintEnd?

Sure! compositeEnd is after paintEnd and includes more of the painting pipeline. It's the timestamp that comes closest to "pixels hit the screen" of all the timestamps that we have in Gecko. We want to capture as much of the painting pipeline as possible in the measurements, so that, for example if WebRender moves work from the paint phase to the composite phase, things don't just suddenly look like they got magically faster.

I understand that W3C Paint Timing API [0] is for pages, but I hope that it's definition of paint applies to our chrome scenario as well,

The chrome scenario is a bit different, though. Pages usually show a plain background color first, then they paint in some partial state, at some point their load event fires, and at some point they're complete. For chrome window opening however, on some platforms we even wait for the load event before we make the window visible at all, and then it's the very first paint whose timestamp we care about, regardless of what's on the screen at that point. My fear is just that as we tweak FCP detection to make sense for pages, the detection will make less sense for window opening. If we want to expose an analogous API for window paint timing, I think we should call it something else and not try to reuse FCP detection for it.

and it would make it easier to reason about our performance metrics if we aligned all timestamps to either compositeEnd or paintEnd unless there's a reason not to?

I agree that it would be valuable to align all these metrics. Aligning on compositeEnd would make sense, I think. That would also match the intention mentioned in the spec: "the timestamp should be the latest timestamp the browser is able to note in this pipeline (best effort). Typically the time at which the frame is submitted to the OS for display is recommended for this API."

[0] https://w3c.github.io/paint-timing/#paint

Flags: needinfo?(mstange)

(In reply to Markus Stange [:mstange] from comment #17)

  • Independent from this, it might be a good idea to rename the test to twinopen because it's really about measuring window opening times and not generic paint times, and the current name seems to be confusing people.

Joel, what are the requirements for renaming an existing talos test? I feel like we used to have some external data sources / repos that needed to get updated in addition to updating references in code in m-c, but maybe those aren't required anymore?

Flags: needinfo?(jmaher)

:igoldan, you recently changed some names for raptor-tp6 in treeherder, is this something you could help out with for tpaint -> twinopen?

Flags: needinfo?(jmaher) → needinfo?(igoldan)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #21)

:igoldan, you recently changed some names for raptor-tp6 in treeherder, is this something you could help out with for tpaint -> twinopen?

Yes, I can help with this. I will have to prepare a SQL script to point perf data form old tpaint to new twinopen one.
That way, Perf sheriffs have continuous graphs when investigating twinopen regressions.

Flags: needinfo?(igoldan)

Currently, tpaint measures from window.open until a message is
received that the MozAfterPaint event has fired from the content
in the new window. This has several issues:

  1. the MozAfterPaint event is dispatched asynchronously and it's
    not the actual time the paint occurred
  2. it adds the lag time of sending a message
  3. it uses MozAfterPaint from the content page
  4. window.open is not the default code path that opening a new window
    follows in Firefox

In the new measurement, twinopen measures from OpenBrowserWindow (which
is closer to ctrl+n) to browser chrome's paintTimeStamp from
MozAfterPaint.

I'm not sure if we want to migrate the tpaint numbers as they really do measure something else. For now, I've kept tpaint and added a new twinopen test. After twinopen builds some history and has stabilized we could then remove tpaint.

sounds good!

Ionut, is there anything I need to do before landing this?

Flags: needinfo?(igoldan)

(In reply to Brendan Dahl [:bdahl] from comment #26)

Ionut, is there anything I need to do before landing this?

No. Simply land this and Perfherder will automatically pick the new twinopen results.

Flags: needinfo?(igoldan)
No longer blocks: top-level-html
Pushed by bdahl@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ad21d7147c00 New talos test to replace tpaint. r=jmaher
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Blocks: 1544612
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: