Find out why the GPU-accelerated tab throbbers caused Talos regressions

RESOLVED DUPLICATE of bug 1352119

Status

()

Firefox
General
RESOLVED DUPLICATE of bug 1352119
4 months ago
23 days ago

People

(Reporter: mconley, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {perf})

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

4 months ago
In bug 759252, an SVG image was landed for the new loading / connecting throbbers in tabs. We used GPU-accelerated rotation transforms to ensure that these throbbers animated at 60fps as much as possible.

When it landed, one of the things we noticed was a Talos regression - that was filed as bug 1352085. Specifically, the following Talos changes were noticed:

Regressions:

 11%  kraken summary osx-10-10 opt      1465.97 -> 1621.92
 10%  tsvgx summary osx-10-10 opt       500.81 -> 548.42
  7%  damp summary linux64 opt          337.84 -> 362.28
  7%  tp5o summary osx-10-10 opt        304.65 -> 325.9
  6%  damp summary linux64 pgo          274.88 -> 292.12
  6%  tsvgr_opacity summary osx-10-10 opt 432.98 -> 459.73
  5%  tp5o responsiveness linux64 opt   50.35 -> 52.8
  5%  tp5o responsiveness linux64 pgo   27.33 -> 28.59
  3%  tsvgx summary linux64 opt         531.18 -> 549.25
  3%  tp5o summary linux64 opt          362.75 -> 373.88
  3%  tp5o summary linux64 pgo          258.32 -> 265.28
  2%  tsvgx summary linux64 pgo         373.07 -> 382.31

Improvements:

 10%  tart summary osx-10-10 opt      11.15 -> 10.09
  9%  tart summary linux64 opt e10s   6.61 -> 6.02
  8%  tart summary linux64 opt        6.27 -> 5.79
  6%  tart summary linux64 pgo e10s   5.05 -> 4.75
  5%  tart summary linux64 pgo        4.49 -> 4.26


Similarly, higher CPU usage was noticed with the new throbbers. Bug 1354080 was filed for that, and a patch was landed that took a chunk of the CPU usage regression away, but not all.

Bug 759252 was backed out because of the above reasons, meaning that we're now using the animated PNGs again, which means that these animations can jank when the main thread gets blocked. :/

Part of the rationale for backing out the new throbbers is because Photon is going to replace the throbbers with a new design anyways (bug 1352119). The animations in bug 1352119 (and in Photon in general) are all going to be GPU accelerated, so we should take this opportunity to at least understand (and hopefully fix) the reason why the throbbers caused the regression. We should do that soon, so that we can be prepared when the new animations land.

Updated

4 months ago
Flags: qe-verify?
Priority: -- → P2
Kraken is supposed to be testing SpiderMonkey performance, right? So the testcase from Kraken is not representative of actual browser usage, but it is affected by changes to tabbrowser? It seems that we should move Kraken to run in a chromeless window. Performance changes in SpiderMonkey are therefore less noticeable since they are mixed with chrome/tabbrowser code.

Same question for tsvgx, from https://gbrownmozilla.wordpress.com/2016/09/30/firefox-for-android-performance-measures-q3-check-up-2/:
> An svg-only number that measures SVG rendering performance. About half of the tests are animations or 
> iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the 
> maximum rendering throughput of each test. The reported value is the page load time, or, for 
> animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

And DAMP: "measuring: Developer Tools toolbox startup, shutdown, and reload performance"

The key thing to note here is that we saw improvements in tart, which is specifically testing tab animations, and we didn't see any regressions the startup tests (ts_paint, tpaint, tresize, sessionrestore[_no_auto_restore], and tcanvasmark).
Flags: needinfo?(jmaher)

Updated

4 months ago
Assignee: nobody → mconley
Status: NEW → ASSIGNED
Iteration: --- → 55.4 - May 1
Priority: P2 → P1
:jaws, thanks for bringing this topic up.  I think you have some valid points. I would like to balance those with running tests in an environment that end users would see or competitors would judge against.  Kraken is run in AWFY (headless) and probably provides more value there than we get from running it in Talos proper- I should push harder on closing the gap in AWFY vs Talos.

As for TSVGX and DAMP- those do need a proper browser to run in- or at least something that renders properly.  DAMP would need more of it, the svg stuff needs a full rendering engine.  I am excited we saw TART improvements, and would rather not change a bunch of tests to support our development for new code, although that isn't fully out of the question.
Flags: needinfo?(jmaher)

Updated

4 months ago
Flags: qe-verify? → qe-verify-
Note that while we have these tests in the state that they are in (aka running with the full browser chrome and whatnot) there is a great chance that this regression is actually uncovering a real issue.  Running performance tests in simulated environments such as a headless browser isn't really interesting with the goal of testing the performance of something close to what real users experience.  Not saying that's what the existing Talos tests give us, just pointing out that moving things further away from a real browser environment isn't necessarily a good idea either.
(Reporter)

Comment 4

4 months ago
At least for tsvgx, at least part of the problem appears to be that the animations in the hixie-*.svg files are kicked off after the "load" event, which occurs _before_ the tabbrowser binding has heard STATE_STOP and STATE_IS_NETWORK via the tab's nsIWebProgressListener. This means that the GPU-accelerated throbber tends to appear _during_ the tsvgx test, meaning that we're compositing more frequently and likely taking more time to do things like "swap buffers" during the period of time that the test is recording.

Updated

4 months ago
Iteration: 55.4 - May 1 → 55.5 - May 15

Updated

4 months ago
Depends on: 1362120
(Reporter)

Comment 5

4 months ago
Not sure why bug 1362120 was added as a dependency. Bug 1362120 seems more related to the svg.disabled pref applying to the browser UI and not just web content.
No longer depends on: 1362120
(Reporter)

Updated

4 months ago
See Also: → bug 1355416
(Reporter)

Comment 6

4 months ago
Bas's patch in bug 1355416 might impact this, because apparently there are cases where we'll consume CPU when we're GPU bound, which might be contributing to these regressions.
(Reporter)

Updated

4 months ago
Blocks: 1363777
No longer blocks: 1348289

Updated

3 months ago
Iteration: 55.5 - May 15 → 55.6 - May 29
(Reporter)

Comment 7

3 months ago
It looks like the Sleep(0) referred to in bug 1355416 only occurs for Windows, which is the _only platform_ that we didn't regress here. So fixing that would have no impact here.
See Also: bug 1355416
(In reply to Mike Conley (:mconley) from comment #7)
> It looks like the Sleep(0) referred to in bug 1355416 only occurs for
> Windows, which is the _only platform_ that we didn't regress here. So fixing
> that would have no impact here.

I should note that that should also -only- have impact when the GPU is very busy. If a tab throbber would cause us to make the GPU be very busy, we have bigger issues :-).

Updated

3 months ago
Iteration: 55.6 - May 29 → 55.7 - Jun 12
Comment hidden (obsolete)
(Reporter)

Comment 10

3 months ago
I'm shifting the focus of this bug slightly. The original SVG tab throbbers are being superseded by the new Photon tab loading throbbers in bug 1352119.

I've pushed the WIP in that patch to try to see what Talos regressions (if any) we're likely to hit along the way.

Unfortunately, it's a whole new whack of regressions (with high confidence except where otherwise noted):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a7348c1f7b97&newProject=try&newRevision=7dbcb13ab7b0&framework=1

Win 32:

damp opt e10s (~2% - medium confidence)
Quantum_1 opt e10s (~2%)
sessionrestore opt e10s (~14%)
sessionrestore_no_auto_restore opt e10s (~12%)
tp5o % Processor Time opt e10s (~18%)
tp5o Main_RSS opt e10s (~3%)
tp5o Private Bytes opt e10s (~5%)
tp5o_webext % Processor Time opt e10s (~25%)
tp5o_webext Main_RSS opt e10s (~3%)
tp5o_webext opt e10s (~3%)

Linux 64:
damp opt e10s (~2% - medium confidence)
tp5o Main_RSS opt e10s (~4%)
tp5o_webext Main_RSS opt e10s (~3%)
ts_paint_webext opt e10s (~4%)
(Reporter)

Comment 11

3 months ago
Ah hah - at least partly this is due to the fact that the Talos windows are thin enough such that the throbber isn't being run on the compositor. Bug 1364221 should help with that.

The other thing is that there's a short fill transition when going between the progress and busy modes. I believe this causes us to repaint for each frame along the colour change.
Depends on: 1364221
(Reporter)

Comment 12

3 months ago
I just confirmed locally that bug 1364221 greatly helps by ensuring that the throbber is run fully on the compositor during the test. The repaint that occurs during the transition between busy and progress states is still a bit of a problem - though perhaps we can do that by crossfading the opacity of two instances of the image overlaid on top of one another[1].

[1]: https://ariya.io/2014/02/tricks-for-gpu-composited-css
(Reporter)

Comment 13

3 months ago
Pushed again to try now that bug 1364221 is on mozilla-central:

Baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=22e16d1c87981b57e94c2f57b3d0bd91c6398074
New throbber: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf2cce69086c88915d5ab4c0417aea24500c7ca6

Comparison: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=22e16d1c87981b57e94c2f57b3d0bd91c6398074&newProject=try&newRevision=cf2cce69086c88915d5ab4c0417aea24500c7ca6&framework=1&showOnlyImportant=0
(Reporter)

Comment 14

2 months ago
I've pushed patches that bypass bug 1371888 and also get rid of the transition between the busy state and the progress state (which causes repaints during the fill transition).

Getting rid of the fill transition had a pretty solid impact on at least the sessionrestore tests.

I'll mention this to squib in the implementation bug.
(Reporter)

Updated

2 months ago
Depends on: 1372292
(Reporter)

Updated

2 months ago
Depends on: 1372400

Updated

2 months ago
Iteration: 55.7 - Jun 12 → 56.1 - Jun 26
(Reporter)

Comment 15

2 months ago
Did some investigations today on the tsvgx regression caused by the most recent version of the patch.

The burst effect seems to be the big factor here. Using LayerScope, I found out that the circle that we're using for the effect, at least on OS X, is quite large, and probably accounts for this large composite I captured locally:

https://perfht.ml/2t7ubJd

I'm going to try shrinking the loading-burst.svg, and trying a rect instead of a circle.
(Reporter)

Comment 16

2 months ago
So reducing the size of the burst helps, but it's clear we're just doing more work with the burst in general. I actually thing we're compositor bound here - we're just compositing more things.
(Reporter)

Updated

2 months ago
Depends on: 1374333
(Reporter)

Comment 17

2 months ago
Got some runs of the latest animation on tsvgx with profiles:

Baseline: http://bit.ly/2tlUEDd
New animation: http://bit.ly/2tlPtmC

From mattwoodrow:

"I did some more thinking about those profiles and I have some partial conclusions.

I've attached an image of what looks to be the repeating pattern on the slower profile. Note that there are 4 composite markers, but only 3 paints. I'm fairly sure that the first paint corresponds to the third composite, and the second paint to the fourth composite, but this would be worth double checking.

This would suggest that we're scheduling a composite that just updates what happened in the async animations, and doesn't reflect any changes from the actual talos test. In a test that is bound by the composition, that seems like the primary cause of the regression.

It's less obvious *why* we're doing this composite without including the most recent paint. The composite in question in the second one (note no red layer transaction marker before it), but the first paint looks like it finished well before this. My guess would be that we've already got the composite scheduled in the event queue, and we run it, even though the layer transaction is also ready, but later in the event queue. This may be an ASAP-mode specific problem, but it seems worth investigating.

As for the super-slow uploads: Normally, composition and texture uploads are asynchronous and should barely show up in profiles. Composition (SwapBuffers in particular) can block if the previous frame's composition hasn't yet completed, which is why we generally see SwapBuffers taking the majority of the time in ASAP mode. Slow texture uploads will generally result in time being consumed in the following frame's attempt to SwapBuffers. We can see this behaviour in the first 3 composites, where the layer transaction (if present) is really quick, and the composite is slow.

For the fourth frame, we appear to be blocking during texture upload, which then results in the following SwapBuffers being async again, and we see a really quick composite.

My guess here is that we're trying to do a partial upload (the time is spent within glTexSubImage2D) to a texture that is still in use by a previous (and unfinished) composite, so we have to block at that point. I can't figure out why we'd have upload contention in this case when we don't normally, except for waving my hands and muttering about the extra composite.

Hope that's helpful!"
(Reporter)

Comment 18

2 months ago
For tsvgx, it appears as if a major portion of the regression is due to the fact that both the parent process and the content process have ASAP mode turned on for this test.

This means that we end up running the throbber and burst animation at an abnormally fast rate, and spam the hell out of the compositor with a super fast framerate, which is just not realistic.

We probably want to turn off ASAP in the parent process when running tsvgx. My local runs show a vast improvement in that test when we do that.
(Reporter)

Comment 19

2 months ago
Alternatively, the tsvgx tests (since I suspect they're really only interested in measuring our ability to paint SVG), should be run in a browser window without showing any of its UI, in order to reduce noise.
(Reporter)

Updated

2 months ago
Depends on: 1374765
(Reporter)

Updated

2 months ago
Depends on: 1279396
(Reporter)

Comment 20

2 months ago
So, for the folks keeping track:

1) ts_paint regression can be eliminated by ensuring we load the SVGs as late as possible (so using display: none on the image container until needed).
2) tsvg tests shouldn't be a problem once bug 1374765 is fixed
3) sessionrestore / sessionrestore_no_auto_restore shouldn't be a problem now that bug 1372292 has landed
4) Main_RSS and Private Bytes measurements will go away once bug 1279396 is fixed. Those measurements are flawed, and AWSY is a better system for measuring memory usage.

That leaves the following tests requiring some investigation for Windows:

damp opt e10s (~2% - medium confidence)
Quantum_1 opt e10s (~2%)
tp5o % Processor Time opt e10s (~18%)
tp5o_webext % Processor Time opt e10s (~25%)
tp5o_webext opt e10s (~3%)
(Reporter)

Comment 21

2 months ago
For the tp5o regression, I noticed that the "aljazeera" subtest tended to show heaviest regression, and I was able to reproduce locally and capture profiles.

Before squib's patch: http://bit.ly/2sCclhG
After: http://bit.ly/2sCdINs

I talked this over with jrmuizel from the Graphics team. What we're noticing in these profiles isn't so much a bottleneck, but more like a general slow down. Things just seem to be taking longer.

We experimented with the frame rate a bit (layout.frame_rate), and we have a pretty strong hypothesis about what's going on.

We suspect that the 60fps animation is drawing CPU cycles away from other threads, which generally slows things down. The 60 fps animation is crazy-smooth, like super smooth, and UX has designed it very well - the problem seems to be, basically, that we're just demanding more from the machine to do that animation smoothly, and this is the cost.

Assuming the above is true (and our experiments with layout.frame_rate support it), I suggest the following:

a) Determine whether or not we want tp5o test to show the browser UI. Do the folks who care about what tp5o measures care about this sort of thing?
b) Have Product weigh the trade-off here. We have a trade-off between a super awesome 60fps animation vs generally loading pages faster. This problem is unique, I think, to the loading throbber / burst simply because this animation always occurs during page load, and draws resources away from the CPU that we're using to render web content.
(Reporter)

Comment 22

2 months ago
There's a potential middleground here too - UX could produce a spritesheet for the loading throbber that works at a lower frame rate, and then we set the step functions in our animation CSS accordingly. That means we get a smooth lower frame rate animation while sacrificing less CPU power.
(Reporter)

Comment 23

2 months ago
Here's a more recent comparison, using the most recent patches in bug 1352119:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=eae766ce4278&newProject=try&newRevision=80e43193a8c7&framework=1

Comment 24

2 months ago
(In reply to Mike Conley (:mconley) from comment #22)
> There's a potential middleground here too - UX could produce a spritesheet
> for the loading throbber that works at a lower frame rate, and then we set
> the step functions in our animation CSS accordingly. That means we get a
> smooth lower frame rate animation while sacrificing less CPU power.

can you test without tab throbbers and post how much performance boost it causes?

Personally care more about performance than looks and web-content performance is more important , so a preference to disable tab throbbers would be welcome considering on my work lap firefox is slow with animations already and disabling toolkit.cosmeticAnimations.enabled gives it a huge performance boost, so an option to completely disable it would be very kind and also reduce the power usage when on battery since high cpu/gpu won't be used on displaying  Animations resulting in faster content loads and and increase in webpage load times.
Flags: needinfo?(mconley)
(Reporter)

Comment 25

2 months ago
More recent comparison both with and without the throbbers, now including OS X:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7a3ce6b601f2&newProject=try&newRevision=76d3df1a8885&framework=1&filter=osx&showOnlyImportant=0

(In reply to shellye from comment #24)

> can you test without tab throbbers and post how much performance boost it
> causes?
> 

Sure - pushed:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7a3ce6b601f2&newProject=try&newRevision=9eede4a5a10a0eccbbcafe7e08d3e965c7ed4d0b&framework=1&showOnlyImportant=0
Flags: needinfo?(mconley)
(Reporter)

Comment 26

2 months ago
(In reply to Mike Conley (:mconley) from comment #25)
> More recent comparison both with and without the throbbers, now including OS
> X:
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=7a3ce6b601f2&newProject=try&newR
> evision=76d3df1a8885&framework=1&filter=osx&showOnlyImportant=0
> 

Looks like there are a few more tests to turn the browser UI off for:

1. tsvgr_opacity
2. glterrain
3. dromaeo (dom and css)
4. tscrollx
5. basic_compositor_video
6. glvideo
7. a11yr

These three tests:

1. tart
2. cart
3. damp

Exercise the browser UI more, so I think we should keep the UI activated, and try to investigate these regressions.

I've filed bug 1375956 to disable the browser UI in the first set of tests.
(Reporter)

Comment 27

2 months ago
(In reply to shellye from comment #24)
> Personally care more about performance than looks and web-content
> performance is more important , so a preference to disable tab throbbers
> would be welcome considering on my work lap firefox is slow with animations
> already and disabling toolkit.cosmeticAnimations.enabled gives it a huge
> performance boost, so an option to completely disable it would be very kind
> and also reduce the power usage when on battery since high cpu/gpu won't be
> used on displaying  Animations resulting in faster content loads and and
> increase in webpage load times.

Interestingly, at least according to Talos (which is imperfect and only measures a small set of things), full-on disabling the throbber doesn't affect the tp5o test significantly - at least as compared against what we currently ship, with the APNG throbber.

Comment 28

2 months ago
(In reply to Mike Conley (:mconley) from comment #27)
> (In reply to shellye from comment #24)
> > Personally care more about performance than looks and web-content
> > performance is more important , so a preference to disable tab throbbers
> > would be welcome considering on my work lap firefox is slow with animations
> > already and disabling toolkit.cosmeticAnimations.enabled gives it a huge
> > performance boost, so an option to completely disable it would be very kind
> > and also reduce the power usage when on battery since high cpu/gpu won't be
> > used on displaying  Animations resulting in faster content loads and and
> > increase in webpage load times.
> 
> Interestingly, at least according to Talos (which is imperfect and only
> measures a small set of things), full-on disabling the throbber doesn't
> affect the tp5o test significantly - at least as compared against what we
> currently ship, with the APNG throbber.

So basically no improvement by disabling throbbers? That's odd right?
Still a preference to disable tab throbbers would be welcome considering toolkit.cosmeticAnimations.enabled=false gives a good speed boost (which was not that much in tests)


Thanks for the detailed info :)
(Reporter)

Comment 29

2 months ago
More up-to-day comparison between baseline and newest version of bug 1352119:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dba027c4192d&newProject=try&newRevision=635630525317&framework=1

Updated

2 months ago
Iteration: 56.1 - Jun 26 → 56.2 - Jul 10

Updated

a month ago
Iteration: 56.2 - Jul 10 → 56.3 - Jul 24

Updated

a month ago
Iteration: 56.3 - Jul 24 → 56.4 - Aug 1
(Reporter)

Comment 30

23 days ago
The work I've been doing in this bug has started to be applied in the actual implementation bug. I don't think there's much utility in having this bug exist independently, so I'm just going to dupe it over.
Status: ASSIGNED → RESOLVED
Last Resolved: 23 days ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1352119

Updated

23 days ago
Assignee: mconley → nobody
Iteration: 56.4 - Aug 1 → ---
Flags: qe-verify-
Priority: P1 → --
Whiteboard: [photon-performance]
You need to log in before you can comment on or make changes to this bug.