Closed Bug 1357093 Opened 8 years ago Closed 7 years ago

Find out why the GPU-accelerated tab throbbers caused Talos regressions

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1352119

People

(Reporter: mconley, Unassigned)

References

Details

(Keywords: perf)

In bug 759252, an SVG image was landed for the new loading / connecting throbbers in tabs. We used GPU-accelerated rotation transforms to ensure that these throbbers animated at 60fps as much as possible. When it landed, one of the things we noticed was a Talos regression - that was filed as bug 1352085. Specifically, the following Talos changes were noticed: Regressions: 11% kraken summary osx-10-10 opt 1465.97 -> 1621.92 10% tsvgx summary osx-10-10 opt 500.81 -> 548.42 7% damp summary linux64 opt 337.84 -> 362.28 7% tp5o summary osx-10-10 opt 304.65 -> 325.9 6% damp summary linux64 pgo 274.88 -> 292.12 6% tsvgr_opacity summary osx-10-10 opt 432.98 -> 459.73 5% tp5o responsiveness linux64 opt 50.35 -> 52.8 5% tp5o responsiveness linux64 pgo 27.33 -> 28.59 3% tsvgx summary linux64 opt 531.18 -> 549.25 3% tp5o summary linux64 opt 362.75 -> 373.88 3% tp5o summary linux64 pgo 258.32 -> 265.28 2% tsvgx summary linux64 pgo 373.07 -> 382.31 Improvements: 10% tart summary osx-10-10 opt 11.15 -> 10.09 9% tart summary linux64 opt e10s 6.61 -> 6.02 8% tart summary linux64 opt 6.27 -> 5.79 6% tart summary linux64 pgo e10s 5.05 -> 4.75 5% tart summary linux64 pgo 4.49 -> 4.26 Similarly, higher CPU usage was noticed with the new throbbers. Bug 1354080 was filed for that, and a patch was landed that took a chunk of the CPU usage regression away, but not all. Bug 759252 was backed out because of the above reasons, meaning that we're now using the animated PNGs again, which means that these animations can jank when the main thread gets blocked. :/ Part of the rationale for backing out the new throbbers is because Photon is going to replace the throbbers with a new design anyways (bug 1352119). The animations in bug 1352119 (and in Photon in general) are all going to be GPU accelerated, so we should take this opportunity to at least understand (and hopefully fix) the reason why the throbbers caused the regression. We should do that soon, so that we can be prepared when the new animations land.
Flags: qe-verify?
Priority: -- → P2
Kraken is supposed to be testing SpiderMonkey performance, right? So the testcase from Kraken is not representative of actual browser usage, but it is affected by changes to tabbrowser? It seems that we should move Kraken to run in a chromeless window. Performance changes in SpiderMonkey are therefore less noticeable since they are mixed with chrome/tabbrowser code. Same question for tsvgx, from https://gbrownmozilla.wordpress.com/2016/09/30/firefox-for-android-performance-measures-q3-check-up-2/: > An svg-only number that measures SVG rendering performance. About half of the tests are animations or > iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the > maximum rendering throughput of each test. The reported value is the page load time, or, for > animations/iterations – overall duration the sequence/animation took to complete. Lower values are better. And DAMP: "measuring: Developer Tools toolbox startup, shutdown, and reload performance" The key thing to note here is that we saw improvements in tart, which is specifically testing tab animations, and we didn't see any regressions the startup tests (ts_paint, tpaint, tresize, sessionrestore[_no_auto_restore], and tcanvasmark).
Flags: needinfo?(jmaher)
Assignee: nobody → mconley
Status: NEW → ASSIGNED
Iteration: --- → 55.4 - May 1
Priority: P2 → P1
:jaws, thanks for bringing this topic up. I think you have some valid points. I would like to balance those with running tests in an environment that end users would see or competitors would judge against. Kraken is run in AWFY (headless) and probably provides more value there than we get from running it in Talos proper- I should push harder on closing the gap in AWFY vs Talos. As for TSVGX and DAMP- those do need a proper browser to run in- or at least something that renders properly. DAMP would need more of it, the svg stuff needs a full rendering engine. I am excited we saw TART improvements, and would rather not change a bunch of tests to support our development for new code, although that isn't fully out of the question.
Flags: needinfo?(jmaher)
Flags: qe-verify? → qe-verify-
Note that while we have these tests in the state that they are in (aka running with the full browser chrome and whatnot) there is a great chance that this regression is actually uncovering a real issue. Running performance tests in simulated environments such as a headless browser isn't really interesting with the goal of testing the performance of something close to what real users experience. Not saying that's what the existing Talos tests give us, just pointing out that moving things further away from a real browser environment isn't necessarily a good idea either.
At least for tsvgx, at least part of the problem appears to be that the animations in the hixie-*.svg files are kicked off after the "load" event, which occurs _before_ the tabbrowser binding has heard STATE_STOP and STATE_IS_NETWORK via the tab's nsIWebProgressListener. This means that the GPU-accelerated throbber tends to appear _during_ the tsvgx test, meaning that we're compositing more frequently and likely taking more time to do things like "swap buffers" during the period of time that the test is recording.
Iteration: 55.4 - May 1 → 55.5 - May 15
Depends on: 1362120
Not sure why bug 1362120 was added as a dependency. Bug 1362120 seems more related to the svg.disabled pref applying to the browser UI and not just web content.
No longer depends on: 1362120
See Also: → 1355416
Bas's patch in bug 1355416 might impact this, because apparently there are cases where we'll consume CPU when we're GPU bound, which might be contributing to these regressions.
Iteration: 55.5 - May 15 → 55.6 - May 29
It looks like the Sleep(0) referred to in bug 1355416 only occurs for Windows, which is the _only platform_ that we didn't regress here. So fixing that would have no impact here.
See Also: 1355416
(In reply to Mike Conley (:mconley) from comment #7) > It looks like the Sleep(0) referred to in bug 1355416 only occurs for > Windows, which is the _only platform_ that we didn't regress here. So fixing > that would have no impact here. I should note that that should also -only- have impact when the GPU is very busy. If a tab throbber would cause us to make the GPU be very busy, we have bigger issues :-).
Iteration: 55.6 - May 29 → 55.7 - Jun 12
I'm shifting the focus of this bug slightly. The original SVG tab throbbers are being superseded by the new Photon tab loading throbbers in bug 1352119. I've pushed the WIP in that patch to try to see what Talos regressions (if any) we're likely to hit along the way. Unfortunately, it's a whole new whack of regressions (with high confidence except where otherwise noted): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a7348c1f7b97&newProject=try&newRevision=7dbcb13ab7b0&framework=1 Win 32: damp opt e10s (~2% - medium confidence) Quantum_1 opt e10s (~2%) sessionrestore opt e10s (~14%) sessionrestore_no_auto_restore opt e10s (~12%) tp5o % Processor Time opt e10s (~18%) tp5o Main_RSS opt e10s (~3%) tp5o Private Bytes opt e10s (~5%) tp5o_webext % Processor Time opt e10s (~25%) tp5o_webext Main_RSS opt e10s (~3%) tp5o_webext opt e10s (~3%) Linux 64: damp opt e10s (~2% - medium confidence) tp5o Main_RSS opt e10s (~4%) tp5o_webext Main_RSS opt e10s (~3%) ts_paint_webext opt e10s (~4%)
Ah hah - at least partly this is due to the fact that the Talos windows are thin enough such that the throbber isn't being run on the compositor. Bug 1364221 should help with that. The other thing is that there's a short fill transition when going between the progress and busy modes. I believe this causes us to repaint for each frame along the colour change.
Depends on: 1364221
I just confirmed locally that bug 1364221 greatly helps by ensuring that the throbber is run fully on the compositor during the test. The repaint that occurs during the transition between busy and progress states is still a bit of a problem - though perhaps we can do that by crossfading the opacity of two instances of the image overlaid on top of one another[1]. [1]: https://ariya.io/2014/02/tricks-for-gpu-composited-css
I've pushed patches that bypass bug 1371888 and also get rid of the transition between the busy state and the progress state (which causes repaints during the fill transition). Getting rid of the fill transition had a pretty solid impact on at least the sessionrestore tests. I'll mention this to squib in the implementation bug.
Depends on: 1372292
Depends on: 1372400
Iteration: 55.7 - Jun 12 → 56.1 - Jun 26
Did some investigations today on the tsvgx regression caused by the most recent version of the patch. The burst effect seems to be the big factor here. Using LayerScope, I found out that the circle that we're using for the effect, at least on OS X, is quite large, and probably accounts for this large composite I captured locally: https://perfht.ml/2t7ubJd I'm going to try shrinking the loading-burst.svg, and trying a rect instead of a circle.
So reducing the size of the burst helps, but it's clear we're just doing more work with the burst in general. I actually thing we're compositor bound here - we're just compositing more things.
Depends on: 1374333
Got some runs of the latest animation on tsvgx with profiles: Baseline: http://bit.ly/2tlUEDd New animation: http://bit.ly/2tlPtmC From mattwoodrow: "I did some more thinking about those profiles and I have some partial conclusions. I've attached an image of what looks to be the repeating pattern on the slower profile. Note that there are 4 composite markers, but only 3 paints. I'm fairly sure that the first paint corresponds to the third composite, and the second paint to the fourth composite, but this would be worth double checking. This would suggest that we're scheduling a composite that just updates what happened in the async animations, and doesn't reflect any changes from the actual talos test. In a test that is bound by the composition, that seems like the primary cause of the regression. It's less obvious *why* we're doing this composite without including the most recent paint. The composite in question in the second one (note no red layer transaction marker before it), but the first paint looks like it finished well before this. My guess would be that we've already got the composite scheduled in the event queue, and we run it, even though the layer transaction is also ready, but later in the event queue. This may be an ASAP-mode specific problem, but it seems worth investigating. As for the super-slow uploads: Normally, composition and texture uploads are asynchronous and should barely show up in profiles. Composition (SwapBuffers in particular) can block if the previous frame's composition hasn't yet completed, which is why we generally see SwapBuffers taking the majority of the time in ASAP mode. Slow texture uploads will generally result in time being consumed in the following frame's attempt to SwapBuffers. We can see this behaviour in the first 3 composites, where the layer transaction (if present) is really quick, and the composite is slow. For the fourth frame, we appear to be blocking during texture upload, which then results in the following SwapBuffers being async again, and we see a really quick composite. My guess here is that we're trying to do a partial upload (the time is spent within glTexSubImage2D) to a texture that is still in use by a previous (and unfinished) composite, so we have to block at that point. I can't figure out why we'd have upload contention in this case when we don't normally, except for waving my hands and muttering about the extra composite. Hope that's helpful!"
For tsvgx, it appears as if a major portion of the regression is due to the fact that both the parent process and the content process have ASAP mode turned on for this test. This means that we end up running the throbber and burst animation at an abnormally fast rate, and spam the hell out of the compositor with a super fast framerate, which is just not realistic. We probably want to turn off ASAP in the parent process when running tsvgx. My local runs show a vast improvement in that test when we do that.
Alternatively, the tsvgx tests (since I suspect they're really only interested in measuring our ability to paint SVG), should be run in a browser window without showing any of its UI, in order to reduce noise.
Depends on: 1374765
Depends on: 1279396
So, for the folks keeping track: 1) ts_paint regression can be eliminated by ensuring we load the SVGs as late as possible (so using display: none on the image container until needed). 2) tsvg tests shouldn't be a problem once bug 1374765 is fixed 3) sessionrestore / sessionrestore_no_auto_restore shouldn't be a problem now that bug 1372292 has landed 4) Main_RSS and Private Bytes measurements will go away once bug 1279396 is fixed. Those measurements are flawed, and AWSY is a better system for measuring memory usage. That leaves the following tests requiring some investigation for Windows: damp opt e10s (~2% - medium confidence) Quantum_1 opt e10s (~2%) tp5o % Processor Time opt e10s (~18%) tp5o_webext % Processor Time opt e10s (~25%) tp5o_webext opt e10s (~3%)
For the tp5o regression, I noticed that the "aljazeera" subtest tended to show heaviest regression, and I was able to reproduce locally and capture profiles. Before squib's patch: http://bit.ly/2sCclhG After: http://bit.ly/2sCdINs I talked this over with jrmuizel from the Graphics team. What we're noticing in these profiles isn't so much a bottleneck, but more like a general slow down. Things just seem to be taking longer. We experimented with the frame rate a bit (layout.frame_rate), and we have a pretty strong hypothesis about what's going on. We suspect that the 60fps animation is drawing CPU cycles away from other threads, which generally slows things down. The 60 fps animation is crazy-smooth, like super smooth, and UX has designed it very well - the problem seems to be, basically, that we're just demanding more from the machine to do that animation smoothly, and this is the cost. Assuming the above is true (and our experiments with layout.frame_rate support it), I suggest the following: a) Determine whether or not we want tp5o test to show the browser UI. Do the folks who care about what tp5o measures care about this sort of thing? b) Have Product weigh the trade-off here. We have a trade-off between a super awesome 60fps animation vs generally loading pages faster. This problem is unique, I think, to the loading throbber / burst simply because this animation always occurs during page load, and draws resources away from the CPU that we're using to render web content.
There's a potential middleground here too - UX could produce a spritesheet for the loading throbber that works at a lower frame rate, and then we set the step functions in our animation CSS accordingly. That means we get a smooth lower frame rate animation while sacrificing less CPU power.
(In reply to Mike Conley (:mconley) from comment #22) > There's a potential middleground here too - UX could produce a spritesheet > for the loading throbber that works at a lower frame rate, and then we set > the step functions in our animation CSS accordingly. That means we get a > smooth lower frame rate animation while sacrificing less CPU power. can you test without tab throbbers and post how much performance boost it causes? Personally care more about performance than looks and web-content performance is more important , so a preference to disable tab throbbers would be welcome considering on my work lap firefox is slow with animations already and disabling toolkit.cosmeticAnimations.enabled gives it a huge performance boost, so an option to completely disable it would be very kind and also reduce the power usage when on battery since high cpu/gpu won't be used on displaying Animations resulting in faster content loads and and increase in webpage load times.
Flags: needinfo?(mconley)
(In reply to Mike Conley (:mconley) from comment #25) > More recent comparison both with and without the throbbers, now including OS > X: > > https://treeherder.mozilla.org/perf.html#/ > compare?originalProject=try&originalRevision=7a3ce6b601f2&newProject=try&newR > evision=76d3df1a8885&framework=1&filter=osx&showOnlyImportant=0 > Looks like there are a few more tests to turn the browser UI off for: 1. tsvgr_opacity 2. glterrain 3. dromaeo (dom and css) 4. tscrollx 5. basic_compositor_video 6. glvideo 7. a11yr These three tests: 1. tart 2. cart 3. damp Exercise the browser UI more, so I think we should keep the UI activated, and try to investigate these regressions. I've filed bug 1375956 to disable the browser UI in the first set of tests.
(In reply to shellye from comment #24) > Personally care more about performance than looks and web-content > performance is more important , so a preference to disable tab throbbers > would be welcome considering on my work lap firefox is slow with animations > already and disabling toolkit.cosmeticAnimations.enabled gives it a huge > performance boost, so an option to completely disable it would be very kind > and also reduce the power usage when on battery since high cpu/gpu won't be > used on displaying Animations resulting in faster content loads and and > increase in webpage load times. Interestingly, at least according to Talos (which is imperfect and only measures a small set of things), full-on disabling the throbber doesn't affect the tp5o test significantly - at least as compared against what we currently ship, with the APNG throbber.
(In reply to Mike Conley (:mconley) from comment #27) > (In reply to shellye from comment #24) > > Personally care more about performance than looks and web-content > > performance is more important , so a preference to disable tab throbbers > > would be welcome considering on my work lap firefox is slow with animations > > already and disabling toolkit.cosmeticAnimations.enabled gives it a huge > > performance boost, so an option to completely disable it would be very kind > > and also reduce the power usage when on battery since high cpu/gpu won't be > > used on displaying Animations resulting in faster content loads and and > > increase in webpage load times. > > Interestingly, at least according to Talos (which is imperfect and only > measures a small set of things), full-on disabling the throbber doesn't > affect the tp5o test significantly - at least as compared against what we > currently ship, with the APNG throbber. So basically no improvement by disabling throbbers? That's odd right? Still a preference to disable tab throbbers would be welcome considering toolkit.cosmeticAnimations.enabled=false gives a good speed boost (which was not that much in tests) Thanks for the detailed info :)
Iteration: 56.1 - Jun 26 → 56.2 - Jul 10
Iteration: 56.2 - Jul 10 → 56.3 - Jul 24
Iteration: 56.3 - Jul 24 → 56.4 - Aug 1
The work I've been doing in this bug has started to be applied in the actual implementation bug. I don't think there's much utility in having this bug exist independently, so I'm just going to dupe it over.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Assignee: mconley → nobody
Iteration: 56.4 - Aug 1 → ---
Flags: qe-verify-
Priority: P1 → --
Whiteboard: [photon-performance]
You need to log in before you can comment on or make changes to this bug.