Open Bug 1233163 (apz-talos) Opened 9 years ago Updated 2 years ago

[meta] Talos regression tracking for APZ on desktop

Categories

(Core :: Panning and Zooming, defect, P3)

defect

Tracking

()

People

(Reporter: botond, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [gfx-noted])

BenWa suggested that we consolidate the tracking of APZ Talos regressions into a single bug, where we track Talos performance with and without APZ over time.

As APZ is enabled on Nightly, we can get the APZ numbers from the Talos runs on m-c pushes.

To get the non-APZ numbers, we can do periodic Try pushes with the APZ pref disabled.

Once we're happy with the numbers, we can close this bug and its dependencies.
Depends on: 1216924
(In reply to Botond Ballo [:botond] from comment #0)
> To get the non-APZ numbers, we can do periodic Try pushes with the APZ pref
> disabled.

Here's one based on a recent m-c: https://treeherder.mozilla.org/#/jobs?repo=try&revision=905696b2ba41
(In reply to Botond Ballo [:botond] from comment #1)
> Here's one based on a recent m-c:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=905696b2ba41

Tweaking the trychooser syntax a bit to actually capture Talos tests on all three desktop platforms:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f699cdcf6d1
I did some retriggers on the above try push and the base m-c push to make sure we have enough data points. Results should show up at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=99137d6d4061&newProject=try&newRevision=2f699cdcf6d1&framework=1 as they finish up.
The only real regression appears to be tscrollx. Retriggers on windows/OSX are not getting scheduled so I cancelled that and kicked off a new try push with a more recent m-c base to see if that works better.
Doh, I did that backwards. To see the regression introduced by APZ we need to have the "base" as the try push with APZ disabled and the "new" as the m-c push which has APZ enabled. So with that in mind tscrollx didn't regress at all, but other tests did.
Blocks: 1216924
No longer depends on: 1216924
I spent some time today drilling into the tart regression on Linux, to confirm that it was largely caused by the event regions code. Since the profiler wasn't helping much, I isolated one of the tests, ran it a bunch of times, and adjusted the frame time recording code to record the paint cycle instead of the composite, which confirmed that's where the regression was happening. Since the test runs in ASAP mode any slowdown on the paint side will in fact regress the test, but since the frame time increases from something like ~2ms to ~2.2ms it's only noticeable in ASAP mode. If we have 60fps vsync going the difference washes out.

The other main cause of the regression is the displayport area being larger. I didn't drill into that as deeply but I checked that with APZ enabled and displayport suppressed the numbers were a bit lower (in fact they were almost exactly what the numbers are with APZ disable and event-regions enabled). Therefore this accounts for the regression in tart. I'll take a quick look at tp5o as well since that's the other test which is showing large-ish regressions.
Looks like I can't run tp5o locally. However according to talos/test.py it also runs in ASAP mode, as do pretty much every other test that showed a significant regression - they will all be subject to the same issue as described above. So I'm fine with accepting this regression as the tradeoff for having APZ, and I'm going to move this off the blocker list. We should still figure out ways to improve the event regions code so that it doesn't eat up as much time. For example, bug 1203140 might be one way, although that probably won't affect most of the talos tests unless there's touch listeners on the pages.
Blocks: apz-desktop
No longer blocks: all-aboard-apz
apz is riding the trains and looks to be affecting aurora.  I assume this is intended, please confirm!
Flags: needinfo?(bugmail.mozilla)
Yes, this is intended. Thanks for checking!
Flags: needinfo?(bugmail.mozilla)
Keywords: meta
Whiteboard: [gfx-noted]
Depends on: 1251937
Blocks: 1251937
No longer depends on: 1251937
I was going to update some related e10s perf bugs, does anyone have a fresh try push? I only have a partial one (plus windows seem to be still pending).

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=e3a41a0486ee&newProject=try&newRevision=34f1f2a97447&framework=1

tp5o Main_RSS opt e10s
- osx-10-10     -13.44%

tp5o_scroll
- linux64       -32.97%
- osx-10-10     -34.64%

tps opt e10s
- linux64       -6.87%
- osx-10-10     -28.96%

ts_paint opt e10s
- osx-10-10     -10.88%
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> I was going to update some related e10s perf bugs, does anyone have a fresh
> try push? I only have a partial one (plus windows seem to be still pending).

We now have a dashboard for tracking e10s performance regressions:

https://treeherder.allizom.org/perf.html#/e10s
(In reply to William Lachance (:wlach) from comment #12)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> > I was going to update some related e10s perf bugs, does anyone have a fresh
> > try push? I only have a partial one (plus windows seem to be still pending).
> 
> We now have a dashboard for tracking e10s performance regressions:
> 
> https://treeherder.allizom.org/perf.html#/e10s

I don't get it. Where can I find the numbers with APZ off on that dashboard? Sorry if I'm missing something.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> tp5o_scroll

this was opt e10s too
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #13)
> (In reply to William Lachance (:wlach) from comment #12)
> > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> > > I was going to update some related e10s perf bugs, does anyone have a fresh
> > > try push? I only have a partial one (plus windows seem to be still pending).
> > 
> > We now have a dashboard for tracking e10s performance regressions:
> > 
> > https://treeherder.allizom.org/perf.html#/e10s
> 
> I don't get it. Where can I find the numbers with APZ off on that dashboard?
> Sorry if I'm missing something.

Sorry, I misunderstood this bug. That information is not on the dashboard, so you'll have to continue with the try run approach. :(
Bug 1242609 might help but it's not a strict requirement to close the regressions.
Depends on: 1242609
Some notes:

- e10s vs non-e10s talos numbers can be found on the e10s talos perfherder dashboard, at [1].

- a subset of the above regressions are caused by APZ (because APZ is enabled in the e10s case only). The improvement caused by disabling APZ can be seen at [2] and are summarized at [3].

- I'm attempting to further subdivide the APZ-caused regression to see what portion of it is caused by event regions, which get enabled by APZ, and is needed for APZ correctness. I did a try push based on the same m-c cset as [2], but with APZ disabled and event-regions enabled. That try push is at [4].

[1] https://treeherder.allizom.org/perf.html#/e10s
[2] https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=a4929411c0aa&newProject=try&newRevision=850e39c8f668&framework=1
[3] https://wiki.mozilla.org/Electrolysis/Release_Criteria#APZ_Regressions
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5f7e2c7dbff
Depends on: 1257290
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.