Closed Bug 1496649 Opened 6 years ago Closed 6 years ago

Convert tresize addon to a webextension

Categories

(Testing :: Talos, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: aswan, Assigned: aswan)

References

Details

Attachments

(3 files)

      No description provided.
Without really thinking through the consequences I converted tresize to a pageloader test (its not clear to me why it was ever a startup test).  As part of this I changed it so the test page is loaded via http instead of from a chrome: url.  But this caused a regression in the test measurements.  You can see the code and the results here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=99caef441b0127225a31ae5b281ab9044784adbb&selectedJob=203549308
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=99caef441b0127225a31ae5b281ab9044784adbb&framework=1&filter=tresize&selectedTimeRange=172800

I suspect that the regression is due to the fact that the window contents are now handled in a content process so resize events have to be dispatched via IPC.  But this raises the question of what we want this test to cover.  The current (chrome:) implementation is more tightly focused on the xul resize machinery but switching to http would give us a more realistic measurement of what users will really experience in the field.

For that reason, I would prefer to make this change, though it would mean a (one-time) bump to our baseline measurements.  If there's a strong desire to keep the test more narrowly focused (ie, keep using chrome: urls) I can adapt the patches to do that, but this is a question for the Talos experts.  I'm actually not sure exactly who to ask, trying the usual suspects, feel free to redirect the needinfo if necessary.
Flags: needinfo?(mconley)
Flags: needinfo?(jmaher)
in general I am fine with any test changes as long as they make sense- the numbers don't matter to me more than are we testing something useful.

we have gone back and forth on tresize- will the test be noisier now, in this case no.  I only see shifts in osx and windows 10, not even in windows 10-qr builds.

I think in the past we thought this test would be better served as a test for XUL code specifically so it could measure a small area and not get clouded by noise from the full stack.  Seeing that this change doesn't affect the noise, I wouldn't have any issues with this.

Maybe :mconley has conflicting thoughts based on areas of code to test.
Flags: needinfo?(jmaher)
I'll note that the test owners, according to https://wiki.mozilla.org/Buildbot/Talos/Tests#tresize include avih and jimm. avih isn't working on Firefox anymore, but I'm going to cc jimm in case he wants to weigh in too.

For context, tresize was initially added in bug 731159, and converted to be e10s compatible in bug 1102479.

Reading through bug 731159, I believe tresize was initially designed to be a repaint test for the graphics team. The question they were trying to answer was: "How quickly can we repaint the window as it is resizing?", and I think this was because we were having resize perf issues on hw accelerated Windows at the time: bug 783985

The thing to keep in mind is that tresize is measuring how long it takes to paint the content area between resizes, and with e10s, we have the drawing of the UI decoupled from the rendering of the content - so, as aswan points out in comment 1, we have process communication overhead to grapple with there.

Today, if you open an e10s window, and drag it around, it's less janky to resize the UI than resizing a non-e10s window, but the content has a harder time keeping up. That's the trade-off.

So it all boils down to: what do we want to measure? I actually think I'm more interested in measuring the resizing of the browser UI than I am the content. But that's because I'm biased towards improving the performance of the browser UI. vchin's team might be more interested in measuring the content resize performance - but even then, we'd probably want something a little more representative than the test page that we're loading here.

So, to sum, I'd prefer we load a chrome document here in the parent process and measure that.

Tangentially, looking through the test code - I'm a little worried that it's already a bit noisy, since it's waiting for MozAfterPaint without ensuring that the composite that causes it to fire is for the resize that the test causes[1].

[1]: https://groups.google.com/forum/#!searchin/mozilla.dev.platform/MozAfterPaint%7Csort:date/mozilla.dev.platform/pCLwWdYc_GY/j9A-vWm3AgAJ
Flags: needinfo?(mconley)
Thanks, that's all really helpful.  I'll switch back to chrome: and look into using all the modern MozAfterPaint goodness that's come along since this test was last updated.  I didn't hear any objection to switching tresize from a startup test to a pageloader test, I'll assume that makes sense and I didn't overlook some reason not to do it.  Let me know if I've got that wrong...
I am not sure if tresize could be a pageload test- but if you can figure out how to make that, it would be great.  We would have to adjust the output as talos has different parsing for startup vs pageloader tests.
Yep, no objection from my for turning this into a pageloader test.
The talos tresize test was originally written as a "startup" test which
is confusing since it doesn't measure anything that happens during
browser startup.  Convert it here to the "pageloader" style, which
mostly involves moving files around, also some changes to how the test
results are reported to the Talos framework.
Some new features were added to MozAfterPaint events in bugs 1264409 and
1264798 to allow tests to make more accurate measurements.  Updating
tresize to use these features is not directly related to the main goal
of this bug, but lets do it anyway while we're touching tresize...
This is a pretty straightforward conversion.  We deliberately continue
to use chrome: urls in this test since those are loaded in the parent
process so test measurements will not include additional noise from
IPC to a content process.
(In reply to Mike Conley (:mconley) (:⚙️) from comment #3)
> So it all boils down to: what do we want to measure? I actually think I'm
> more interested in measuring the resizing of the browser UI than I am the
> content. But that's because I'm biased towards improving the performance of
> the browser UI. vchin's team might be more interested in measuring the
> content resize performance - but even then, we'd probably want something a
> little more representative than the test page that we're loading here.

Okay, I pushed some patches that retain the chrome page.  I also switched to using the transaction id and timestamp fields from MozAfterPaint and the result was a significant reduction (improvement) in times:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=553e842c07ddc61451ff698e6c9a3cb62a108476&framework=1&filter=tresize&selectedTimeRange=172800

I assume this is mostly a result of the MozAfterPaint changes.

Looking at the test a little more, I am a little concerned about it.  It starts with a browser window at 425x425 then incrementally increases it by 2 pixels in each dimension 300 times (to reach a final desired geometry of 1025x1025).  It measures the time it takes to repaint the browser window after each of these resizes.  I know next to nothing about the internals of layout and painting, but some of these resizes just repaint the same elements in a slightly different place, others trigger event handlers in CustomizableUI.jsm that rearrange items in the DOM causing a much more extensive re-layout.  But the test result is just an average of the 300 measurements, which vary widely (in a single unscientific sample on my laptop, the individual repaint times from a single test run varied from 6 to 57 milliseconds -- a full order of magnitude!  We're throwing away a bunch of information by just recording the average of these measurements.  Moreover, it would be really easy for some change in the chrome layout to cause a change in the number of DOM manipulations that occur in the course of one test.  But that would change the timing of some small number of the 300 measurements, which would probably end up being lost in the noise.

I don't have any concrete ideas for changing the test and I'm not even sure that the value of finer-grained measurements would outweigh the cost of implementing and maintaining a more complex test, just putting this down here for posterity since I was looking at the test...
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: