Closed Bug 1179377 Opened 5 years ago Closed 3 years ago

Investigate if tpaint could be a bit less noisy

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: smaug, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Attached patch wipSplinter Review
The values tpaint reports tend to be a bit noisy, and it isn't quite clear from the test itself what it tries to test: Is it about painting the first web page after opening a new window, or is it about painting the new window?

The patch tries to open window at a bit more stable time
(right after animation frame callback), and not care about page loading -
just add event listener to the newly created window object asap.
From our earlier discussion on IRC, it sounds like tpaint was capturing the first content paint rather than the first chrome paint (which maybe happened to be together before the favor perf removal patch of bug 930793).

Since tpaint is about displaying the window itself rather than the content (which becomes even more clear in e10s), then the first goal should be to fix the test itself. If that also makes the noise level lower - great.
Blocks: 930793
Actually, looking at the code* it seems like the goal is to measure from "trigger new window" to "window content being rendered", however, looking t the docs** it says "Tests the amount of time it takes the open a new window". These two are actually different things, and arguably both of them are independently important.

*  https://hg.mozilla.org/build/talos/diff/b582727872a6/talos/startup_test/tpaint.html#l1.16
** https://wiki.mozilla.org/Buildbot/Talos/Tests#tpaint

Joel, since you wrote this test, what's your take on this? should it be measuring the time it takes to render the chrome? or the content? and does the latter takes e10s into account correctly?
Flags: needinfo?(jmaher)
As it stands tpaint is not a test which I wrote, although I did update it in bug 896243 as requested by :dolske.

As for changing the test, yes we can easily do that.  Currently, it should be measuring just the time to open a new chrome window, not to load any content.

looking at the noise, linux is pretty stable:
http://graphs.mozilla.org/graph.html#tests=[[82,1,43],[82,1,35],[82,1,41],[82,1,33]]&sel=1428043082558,1435819082558&displayrange=90&datatype=geo

osx is sort of noisy, yet possible to see changes:
http://graphs.mozilla.org/graph.html#tests=[[82,1,57],[82,1,55]]&sel=1428043082558,1435819082558&displayrange=90&datatype=geo

windows is pretty stable:
http://graphs.mozilla.org/graph.html#tests=[[82,1,49],[82,1,47],[82,1,25],[82,1,31],[82,1,45],[82,1,37]]&sel=1428043082558,1435819082558&displayrange=90&datatype=geo

so out of our 6 platforms, 1 is noisy- not really worth changing the test over due to noise.

what should this test be doing?  I believe we should be measuring the chrome window opening, we have plenty of other tests measuring page load parts.
Flags: needinfo?(jmaher)
It certainly loads content. That is what the data: url is about.
So, the current tpaint code measures the content's (data: URI) mozAfterPaint after opening a new window, and smaug's patch changes that to measuring "mozafterpaint asap there is a content Window object" (so likely before the content is loaded).

smaug's approach is good IMO, and possibly better than the current code since it isolates measurement of window chrome opening and rendering from content loading.

The question we need to answer IMO is: should the new code replace the old code? or be added as another test (e.g. tpaint_chrome)?

pros for replacing:
- we don't add a new test (saves automation resources and setups).
- we already have enough tests to measure content-loading + rendering so no need to include that in a test where the main goal is measuring opening a new window.

cons for replacing:
- new window + content rendering is actually a useful thing to measure, and possibly even important to measure with e10s, and replacing the code will lose that.
- we lose ability to track older tpaint results - because the new tpaint will measure a different thing.

Overall, despite the pros and especially because the cons are non-negligible, I tend to think we should add smaug's code as a new test and keep the old tpaint test as well.

Thoughts?
Flags: needinfo?(vdjeric)
Flags: needinfo?(mconley)
Flags: needinfo?(jmaher)
Flags: needinfo?(bugs)
it is probably best to add a new test- although I am on the fence given the slight distinction between both tests.  If we do this, I would like to ensure that 6 months from now we review the historical data and consider getting rid of the old one unless we have different enough results and trends to otherwise keep it.
Flags: needinfo?(jmaher)
I think the distinction between the painting of the window chrome and the painting of content is important, but for the user, the most important thing is probably showing the content. Measuring the latter is just (if not more) important than measuring the former.

So here's what I suggest:

1) Create a new test to measure the time to paint the window chrome
2) Consider modifying the old test to load / render more realistic content - specifically, about:home, since this is the default document to load on new window.

Note that (2) opens us up for fluctuation if about:home changes, but maybe that's a good thing. As this is the default, it needs to be fast. Knowing when it gets slow is valuable, IMO.
Flags: needinfo?(mconley)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #7)
> 2) Consider modifying the old test to load / render more realistic content -
> specifically, about:home, since this is the default document to load on new
> window.
> 
> Note that (2) opens us up for fluctuation if about:home changes, but maybe
> that's a good thing. As this is the default, it needs to be fast. Knowing
> when it gets slow is valuable, IMO.

While I also think it's a good idea to measure about:home load time, it's probably better to consider it as a test of its own without the overhead of opening a new window.

The talos tests _try_ to isolate components/things, and the tpaint test tries to isolate the performance of opening a new window. Adding about:home would add unrelated noise, and we can measure about:home in a simple page-load tests unrelated to new window overhead.

Joel, if we wanted to add a pageload test for about:home, where do you think we should add it? How about a new pageload test: "builtin-content" which will page-load measure some of our internal pages? such as about:addons, about:preferences, about:home, etc?
Flags: needinfo?(jmaher)
The patch does still measure painting the content, but just the initial about:blank, so it removes the
asynchronous loading of data: before anything is measured.
So in e10s it should be effectively: open new window and create the first OOP tab in it and measure when that tab gets first paint.
(whether or not something is actually loaded to the tab. initial about:blank isn't technically loaded, it is just created.)
Flags: needinfo?(bugs)
do we need a pageload test for our internal content?  How would that be different than ts_paint?

One issue in doing this is determining when the page is loaded.  These are usually chrome style pages, so our method for loading/verifying/reporting will be different.

it sounds like we are trying to add a lot of specific tests for things- is this something we would want to track long term, or something we want to optimize once and move on?  The more tests we add, the more overhead in regressions and tooling and machine time we incur.
Flags: needinfo?(jmaher)
Enough people have already chimed in, so I'll keep this brief. 

- My preference is to extend tpaint to report two numbers: the current MozAfterPaint measurement and smaug's new measurement. That way we get continuity with past scores + we avoid the setup/teardown overhead of creating a completely separate new test. 
- Joel: I thought we had nearly reached max capacity for Talos tests?
Flags: needinfo?(vdjeric)
Vladan- we can add more tests- but the more tests we add, the more we need the original test authors to be responsible for crashes, regressions, etc.

as for tpaint to report 2 numbers, it would need more talos work for that since this is a startup test and only measures 1 value.
I don't see any action here- we did a lot of work in the past 6 months to determine what was useful to measure and measure it- this didn't make the cut.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.