1388159 - Improve the measurement in tpaint

Reporter

Description

•

8 years ago

Currently, tpaint is measuring the wrong thing by taking `performance.now` in the callback to MozAfterPaint. The problem is that MozAfterPaint is called asynchronously some time after paint happens, with no guarantees as to when. In the long term, we'll want to switch tpaint to use timeToNonBlankPaint (bug 1381988), but first we'd like to fix when we capture that (bug 1388157). So, until then, I'd like to improve the tpaint by taking MozAfterPaint's event.paintTimeStamp, instead of `performance.now`. This is an easy fix, and should get us very close to the final number making the final patch not affect the numbers as drastically.

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Updated

•

8 years ago

Assignee: nobody → gandalf

Status: NEW → ASSIGNED

Comment hidden (mozreview-request)

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

8 years ago

mozreview-review

Comment on attachment 8894643 [details] Bug 1388159 - Improve the measurement in tpaint. https://reviewboard.mozilla.org/r/165808/#review170924 thanks, this is easy to follow.

Attachment #8894643 - Flags: review?(jmaher) → review+

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Comment 3

•

8 years ago

Thanks Joel! Let's wait for the try run: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f1c4d0f8ab20&newProject=try&newRevision=b80763bb6c03729cc07ff22fdf5309ff07877d17&framework=1&showOnlyImportant=0

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Comment 4

•

8 years ago

aand that will not work, because MozAfterPaint's evt.paintTimeStamp is chrome-only :( I'm not sure if I know how to attempt to use any specialpowers or other tricks to get access to it, since both the tpaint runner and tpaint documents are content only. We may hit a roadblock here and either have to get paintTimeStamp to content or wait for it to be expressed via performance.timing. :jmaher - do you know who might be the right person to decide and potentially implement such a change? DOM code is way outside of my comfort zone.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

8 years ago

we have talospowers available to use :) To answer your question, I don't know exactly who, I would start with :overholt. I guess if TalosPowers doesn't solve this current problem (which it might not since it would have data for chrome and probably not content)

Flags: needinfo?(jmaher)

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Updated

•

8 years ago

Flags: needinfo?(gijskruitbosch+bugs)

:Gijs (he/him)

Comment 6

•

8 years ago

I'm not sure I understand what the issue is here. tabpaint ( https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/tabpaint/content/target.html#49 ) looks to me like it's already using timeToNonBlankPaint. Has something just changed in the intervening 6 months? Or is tpaint doing things radically differently from tabpaint? Could we add the event listener from a message manager script instead of from the content, if that's the problem? It looks like target.html gets loaded over chrome:, and presumably the tpaint/ts_paint content document doesn't. Changing that would work, though we'd technically not be testing the http or file: overhead anymore (assuming that's currently how we load the document) - I'd be a bit surprised if that was significant though... Alternatively, we could use a content process script to detect the document object being created and attach the listener in there, and check the paint time for it. :gandalf, would you have cycles to drive this over the line given this info?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gandalf)

:Gijs (he/him)

Comment 7

•

7 years ago

Tarek / :rwood, any chance of moving this bug along? It would be nice to make our measurements more reliable.

Flags: needinfo?(tarek)

Flags: needinfo?(rwood)

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

7 years ago

tpaint: https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/tpaint/tpaint.html opens a new window with a small bit of html/js to report it was opened: var kidHTML = "<html><meta charset='utf-8'><script>" + "var e = 'MozAfterPaint';" + "function done() {" + " window.removeEventListener(e, done, true);" + " window.opener.postMessage({msg: 'childIsOpen'}, '*');" + " window.close();" + "}" + "window.addEventListener(e, done, true);" + "</" + "script>TPAINT</html>"; is this something that we would want to listen for something besides MozAfterPaint such as: content.window.performance.timing.timeToNonBlankPaint would we need different content to be displayed in the new window?

Tarek Ziadé (:tarek)

Updated

•

7 years ago

Flags: needinfo?(tarek)

Robert Wood [:rwood]

Comment 9

•

7 years ago

I'm not really sure what the state of this is or the desired action is...

Flags: needinfo?(rwood)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 10

•

7 years ago

:gandalf is this bug still valid given there's been little activity in the last 10 months?

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

7 years ago

Priority: -- → P2

Brendan Dahl [:bdahl]

Assignee

Comment 11

•

7 years ago

I started filing another bug about this same issue. In work on transitioning from browser.xul to browser.xhtml I've seen strange results where nearly all talos tests improve but tpaint moves in the opposite direction and I thinks it's because of the strange way we measure tpaint.

I'm happy to take over work on it, but I would like to clarify a few things. Do we want to measure the time it takes to open a new window and paint the 1) browser chrome, 2) the content, or 3) both?

Flags: needinfo?(jmaher)

Brian Grinstead [:bgrins]

Updated

•

7 years ago

Blocks: top-level-html

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Comment 12

•

7 years ago

My understanding is that we want to measure the browser chrome, so (1).

Flags: needinfo?(gandalf)

Brian Grinstead [:bgrins]

Comment 13

•

7 years ago

(In reply to Brendan Dahl [:bdahl] from comment #11)

I started filing another bug about this same issue. In work on transitioning from browser.xul to browser.xhtml I've seen strange results where nearly all talos tests improve but tpaint moves in the opposite direction and I thinks it's because of the strange way we measure tpaint.

I'm happy to take over work on it, but I would like to clarify a few things. Do we want to measure the time it takes to open a new window and paint the 1) browser chrome, 2) the content, or 3) both?

This seems like a "what do we want to measure" question. Vicky, do you or your team have an opinion about this? Specifically: do we care about the browser window itself rendering, or the content inside of it rendering when tracking the time it takes to open a new window?

Flags: needinfo?(vchin)

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

7 years ago

the intention of tpaint is painting a new window, there really isn't content to be displayed, so I vote for what :gandalf says, #1

Flags: needinfo?(jmaher)

:Gijs (he/him)

Comment 15

•

7 years ago

IMHO, one of the other things that's a bit surprising about tpaint is that it uses window.open(), instead of OpenBrowserWindow(). That is, it's equivalent to a webpage calling window.open() with either such "flags" params that force a new window instead of new tab (e.g. height/width), or having turned off the default "open new windows as new tabs". It's not equivalent to using accel-n to just open a new browser window as a user. Then again, it might not be easy to change that given it'll change the baseline. Perhaps we want a new test, esp. if we also care about how quickly activity stream is displayed in the window that then opens...

Brendan Dahl [:bdahl]

Assignee

Comment 16

•

7 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #14)

the intention of tpaint is painting a new window, there really isn't content to be displayed, so I vote for what :gandalf says, #1

We currently use the mozafterpaint in the child, which can now happen before or after the mozafterpaint in the parent, so we're really measuring content load time + the lag time of the message being received. I'm guessing that didn't use to be the case when everything was one process, but we should record the time of the parent process mozafterpaint (or first non blank paint) instead.

Brendan Dahl [:bdahl]

Assignee

Updated

•

7 years ago

Assignee: gandalf → bdahl

Markus Stange [:mstange]

Comment 17

•

7 years ago

(In reply to Brian Grinstead [:bgrins] from comment #13)

(In reply to Brendan Dahl [:bdahl] from comment #11)

I'm happy to take over work on it, but I would like to clarify a few things. Do we want to measure the time it takes to open a new window and paint the 1) browser chrome, 2) the content, or 3) both?

This seems like a "what do we want to measure" question. Vicky, do you or your team have an opinion about this? Specifically: do we care about the browser window itself rendering, or the content inside of it rendering when tracking the time it takes to open a new window?

Bas, Vicky and I discussed this bug this morning. We recommend the following:

The test should measure how long it takes for the user to open a browser window until the browser chrome (not including the content) first paints.
Please follow Gijs' suggestion from comment 15 and use OpenBrowserWindow() instead.
Listen for the MozAfterPaint event on the browser window in the parent process and record its paintTimeStamp. This is actually the compositeEnd timestamp, which is exactly what you want to measure.
Do not use timeToFirstNonBlankPaint. That one is designed for pageload and not for browser windows.
Independent from this, it might be a good idea to rename the test to twinopen because it's really about measuring window opening times and not generic paint times, and the current name seems to be confusing people.

Flags: needinfo?(vchin)

Zibi Braniecki [:zbraniecki][:gandalf]

Reporter

Comment 18

•

7 years ago

Listen for the MozAfterPaint event on the browser window in the parent process and record its paintTimeStamp. This is actually the compositeEnd timestamp, which is exactly what you want to measure.

Can you elaborate here? Why do we want to measure compositeEnd, and not paintEnd?
The way I understood the problem last time I investigated it was that we want paintEnd, but we can't get it because our OMTP doesn't provide the timestamp back.
I'm ok getting compositeEnd for now, but I'd like to understand if it's something we'd like to improve in the future or for some reason not.

I understand that W3C Paint Timing API [0] is for pages, but I hope that it's definition of paint applies to our chrome scenario as well, and it would make it easier to reason about our performance metrics if we aligned all timestamps to either compositeEnd or paintEnd unless there's a reason not to?

[0] https://w3c.github.io/paint-timing/#paint

Flags: needinfo?(mstange)

Markus Stange [:mstange]

Comment 19

•

7 years ago

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #18)

Listen for the MozAfterPaint event on the browser window in the parent process and record its paintTimeStamp. This is actually the compositeEnd timestamp, which is exactly what you want to measure.

Can you elaborate here? Why do we want to measure compositeEnd, and not paintEnd?

Sure! compositeEnd is after paintEnd and includes more of the painting pipeline. It's the timestamp that comes closest to "pixels hit the screen" of all the timestamps that we have in Gecko. We want to capture as much of the painting pipeline as possible in the measurements, so that, for example if WebRender moves work from the paint phase to the composite phase, things don't just suddenly look like they got magically faster.

I understand that W3C Paint Timing API [0] is for pages, but I hope that it's definition of paint applies to our chrome scenario as well,

The chrome scenario is a bit different, though. Pages usually show a plain background color first, then they paint in some partial state, at some point their load event fires, and at some point they're complete. For chrome window opening however, on some platforms we even wait for the load event before we make the window visible at all, and then it's the very first paint whose timestamp we care about, regardless of what's on the screen at that point. My fear is just that as we tweak FCP detection to make sense for pages, the detection will make less sense for window opening. If we want to expose an analogous API for window paint timing, I think we should call it something else and not try to reuse FCP detection for it.

and it would make it easier to reason about our performance metrics if we aligned all timestamps to either compositeEnd or paintEnd unless there's a reason not to?

I agree that it would be valuable to align all these metrics. Aligning on compositeEnd would make sense, I think. That would also match the intention mentioned in the spec: "the timestamp should be the latest timestamp the browser is able to note in this pipeline (best effort). Typically the time at which the frame is submitted to the OS for display is recommended for this API."

[0] https://w3c.github.io/paint-timing/#paint

Flags: needinfo?(mstange)

Brian Grinstead [:bgrins]

Comment 20

•

7 years ago

(In reply to Markus Stange [:mstange] from comment #17)

Independent from this, it might be a good idea to rename the test to twinopen because it's really about measuring window opening times and not generic paint times, and the current name seems to be confusing people.

Joel, what are the requirements for renaming an existing talos test? I feel like we used to have some external data sources / repos that needed to get updated in addition to updating references in code in m-c, but maybe those aren't required anymore?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

7 years ago

:igoldan, you recently changed some names for raptor-tp6 in treeherder, is this something you could help out with for tpaint -> twinopen?

Flags: needinfo?(jmaher) → needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 22

•

7 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #21)

:igoldan, you recently changed some names for raptor-tp6 in treeherder, is this something you could help out with for tpaint -> twinopen?

Yes, I can help with this. I will have to prepare a SQL script to point perf data form old tpaint to new twinopen one.
That way, Perf sheriffs have continuous graphs when investigating twinopen regressions.

Flags: needinfo?(igoldan)

Brendan Dahl [:bdahl]

Assignee

Comment 23

•

6 years ago

Attached file Bug 1388159 - New talos test to replace tpaint. r=jmaher — Details

Currently, tpaint measures from window.open until a message is
received that the MozAfterPaint event has fired from the content
in the new window. This has several issues:

the MozAfterPaint event is dispatched asynchronously and it's
not the actual time the paint occurred
it adds the lag time of sending a message
it uses MozAfterPaint from the content page
window.open is not the default code path that opening a new window
follows in Firefox

In the new measurement, twinopen measures from OpenBrowserWindow (which
is closer to ctrl+n) to browser chrome's paintTimeStamp from
MozAfterPaint.

Brendan Dahl [:bdahl]

Assignee

Comment 24

•

6 years ago

I'm not sure if we want to migrate the tpaint numbers as they really do measure something else. For now, I've kept tpaint and added a new twinopen test. After twinopen builds some history and has stabilized we could then remove tpaint.

Joel Maher ( :jmaher ) (UTC -8)

Comment 25

•

6 years ago

sounds good!

Brendan Dahl [:bdahl]

Assignee

Comment 26

•

6 years ago

Ionut, is there anything I need to do before landing this?

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 27

•

6 years ago

(In reply to Brendan Dahl [:bdahl] from comment #26)

Ionut, is there anything I need to do before landing this?

No. Simply land this and Perfherder will automatically pick the new twinopen results.

Flags: needinfo?(igoldan)

Brian Grinstead [:bgrins]

Updated

•

6 years ago

No longer blocks: top-level-html

Pulsebot

Comment 28

•

6 years ago

Pushed by bdahl@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ad21d7147c00 New talos test to replace tpaint. r=jmaher

Bogdan Tara[:bogdan_tara | bogdant]

Comment 29

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ad21d7147c00

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox67: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla67

Brendan Dahl [:bdahl]

Assignee

Updated

•

6 years ago

Blocks: 1544612

Bug 1388159 - Improve the measurement in tpaint. 8 years ago Zibi Braniecki [:zbraniecki][:gandalf] 59 bytes, text/x-review-board-request	jmaher : review+	Details
Bug 1388159 - New talos test to replace tpaint. r=jmaher 6 years ago Brendan Dahl [:bdahl] 47 bytes, text/x-phabricator-request		Details \| Review