461.12 KB, application/zip
4.93 KB, patch
|Details | Diff | Splinter Review|
4.95 KB, patch
|Details | Diff | Splinter Review|
STR: 1. Go to https://foofighters.com 2. Wait => The page implements a departure-like table. But this is a lot slower in Firefox than in Chrome to reach the end. When I say "a lot slower" it's really a 1:30 minute compared to 10 seconds. On Android this is several minutes even. This is a profile with latest Nightly: https://perfht.ml/2muvkHK I noticed we're especially slow at painting, with paints that can last up to 100ms on my computer. So I enabled "paint flashing" both in Chrome and Firefox. In Firefox we repaint the full table a lot, while in Chrome this never happens.
Created attachment 8841461 [details] foofighters.com.zip Here is the saved page. It's not exactly the same (some images are missing) but the same issue appears.
Mmm likely more Layout given the issue seems to be with the fact we're painting the full table.
FF51.0.1 does not show this behavior. I think it might be due to bug 1300659. If you look at a performance timeline for the site on FF51.0.1 you can see 25 consecutive setInterval() callbacks followed by ~50 requestAnimationFrame() callbacks. In FF54 instead it fires a single setInterval() callback and then two requestAnimationFrame() callbacks, style, layout, and paint before the next setInterval() callback is executed.
I'm a bit conflicted about what to do here. Normally I think we want to prioritize painting over running timer callbacks. In this case, though, painting is so much slower than the other work its basically saturating the main thread. The site is depending on timers getting prioritized higher to force more work through.
Obviously the site could avoid this situation by measuring how fast its running and flipping the correct number of letters every time its timer fires. Or running a pure requestAnimationFrame() for its animation instead of mixing setInterval/requestAnimationFrame. That would cause our slow painting to cause them to lose some animation steps (multiple letters flip at a time), but the overall animation would run in the same total time.
BTW, on my nexus 5x this site works a lot better in FF53 aurora compared to FF51 release.
[Tracking Requested - why for this release]: This is a user visible regression in sites using this animation technique. I spoke with Boris about it and he thinks we should uplift a mitigation before we go to release next week.
I have a patch that allows us to process more timer callbacks in RunTimeout(). Some results of timing how long it takes the site to get to the end of its animation: FF 51.0.1 baseline: no timer yielding = 10 seconds to complete FF54 with different cap values: 1 allowed callback = 33 seconds to complete 2 allowed callbacks = 16 seconds to complete 3 allowed callbacks = 15 seconds to complete 4 allowed callbacks = 14 seconds to complete 5 allowed callbacks = 14 seconds to complete The difference between 10 seconds and 14 seconds is likely due to the yielding caused by ThrottledEventQueue. Unfortunately, I can't easily modify that without completely disabling it. With these values we still avoid jank on this test case: https://people-mozilla.org/~bkelly/timer-flood/index.html I'd like to move forward with this change even though it doesn't exactly match FF51 release behavior. The 4 second difference here can be addressed by speeding up our layout/paint.
Created attachment 8841632 [details] [diff] [review] Run a limited number of timer callbacks in the same event loop runnable. r= smaug Olli, this site has a significant animation slowdown due to my timer yielding changes. This is occurring because the paints it triggers last 40ms to 60ms which means there is always a refresh driver runnable waiting to execute after a timer runs. So it paints after every timer callback instead of coallescing changes between paints. This patch mitigates the issue by allowing us to run a limited number of timers in the same runnable. The limit defaults to 5, but is configurable by a pref. (This will let us update the value in release if necessary without a point release.) Long term we should make RunTimeout() do something like this based on a time budget instead of fixed limit. Its hard to do, though, since RunTimeout() wants to know the number of timers up-front before any timers are invoked. I don't think we could uplift such changes. There is still some difference with release even with these patches due to the ThrottledEventQueue, but its greatly reduced. I'd like to go with this patch as a compromise solution as it still allows us to survive timer floods, etc.
That patch is a hack so we need to figure out how this all should work in different cases: - How should timers be handled when refresh driver tick takes too much time, - what the behavior should be when compositing is slow (I guess just run as many timers as possible) - what if the next refresh driver tick is postponed too much because of code running between ticks.
(In reply to Olli Pettay [:smaug] from comment #10) > That patch is a hack so we need to figure out how this all should work in > different cases: > - How should timers be handled when refresh driver tick takes too much time, > - what the behavior should be when compositing is slow (I guess just run as > many timers as possible) > - what if the next refresh driver tick is postponed too much because of code > running between ticks. What is a hack or not is a bit subjective when none of that is really spec'd. Realistically sites should not expect precise timer behavior if they are saturating the main thread with work. But, yes a time budget or something would be better. I'll file a follow-up.
Created attachment 8841760 [details] [diff] [review] FF52 uplift patch Verified this builds and works in FF52 locally.
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e1a939a64f4b Run a limited number of timer callbacks in the same event loop runnable. r=smaug
Comment on attachment 8841632 [details] [diff] [review] Run a limited number of timer callbacks in the same event loop runnable. r= smaug Approval Request Comment [Feature/Bug causing the regression]: Timer changes in bug 1300659. [User impact if declined]: Some animations that depend on flooding the main thread run slower, although with fewer dropped frames. This patch is a mitigation to allow a limited number of timers to coalesce while still preventing floods from kill fps. [Is this code covered by automated tests?]: We have tests, but not for this specific condition. Its somewhat timing dependent. [Has the fix been verified in Nightly?]: I verified it locally in nightly and beta. Has not merged to m-c yet, though. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: Separate mozilla-beta patch attached to this bug. [Is the change risky?]: Minimal risk [Why is the change risky/not risky?]: Its changing a conditional in a small controlled way. In addition, the patch includes a new preference to allow us to tweak the value in the future without requiring code changes. [String changes made/needed]: None
Comment on attachment 8841760 [details] [diff] [review] FF52 uplift patch See comment 14. This is a separate patch for FF52 branches.
Comment on attachment 8841632 [details] [diff] [review] Run a limited number of timer callbacks in the same event loop runnable. r= smaug work around an issue with some animations and timers, aurora53+
Comment on attachment 8841760 [details] [diff] [review] FF52 uplift patch work around an issue with animations and timers, beta/release 52+ This should be in 52 rc2
remote: https://hg.mozilla.org/releases/mozilla-beta/rev/17d42c8d8f209c6217826a790c3efa1d1f057c19 remote: https://hg.mozilla.org/releases/mozilla-release/rev/c9cd431edad441c7340a4d89daf30620a6e992e7
Setting qe-verify- based on Ben's assessment on manual testing needs (see Comment 14).
I see talos improvements on aurora when this landed there: == Change summary for alert #5291 (as of February 28 2017 14:11 UTC) == Improvements: 11% a11yr summary linux64 pgo e10s 461.2 -> 412.63 10% a11yr summary linux64 opt e10s 461.38 -> 413.4 9% a11yr summary osx-10-10 opt e10s 463.68 -> 420.38 8% a11yr summary windows7-32 pgo e10s 522.12 -> 479.32 8% a11yr summary windows8-64 pgo e10s 482.06 -> 442.58 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=5291
and it looks like tps improvements on trunk: == Change summary for alert #5271 (as of February 27 2017 22:33 UTC) == Improvements: 3% tps summary linux64 opt 48.66 -> 47.39 3% tps summary windows8-64 pgo 50.38 -> 49.1 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=5271
Joel, I still don't fully buy the talos changes based on the analysis I did in bug 1315895 comment 12.
we see a11y improvements on beta as well, cool :)
did this not make it into the 54 merge to aurora last week? I see an improvement, then a regression: https://treeherder.mozilla.org/perf.html#/graphs?timerange=5184000&series=%5Bmozilla-aurora,054905c426e16aca2e90a056152d32e9f9616511,1,1%5D&series=%5Bautoland,054905c426e16aca2e90a056152d32e9f9616511,0,1%5D&series=%5Bmozilla-inbound,054905c426e16aca2e90a056152d32e9f9616511,1,1%5D
This landed two weeks, well before the merge. See comment 16. Note, I also tweaked this algorithm in bug 1343877. However, as I tried to explain in comment 25, I still think these talos changes are a bit of a red herring.
Compared to Google Chrome on same website(https://foofighters.com), the performance seems still behind. Do we have any todos remaining in this bug ?
We can speed up layout/rendering like in bug 929484. As I mentioned in comment 9, though, my changes were just a mitigation and did not completely bring this site to parity. I wrote more about the tradeoffs involved here in: https://blog.wanderview.com/blog/2017/03/13/firefox-52-settimeout-changes/
It seems my mitigation landed here is no longer effective for some reason. The mechanism is still working, though, because I can see a change on this demo when modify the pref: https://mozdevs.github.io/servo-experiments/experiments/tiles/ I also tested with the fix from bug 929484 and it didn't help either. We probably need to re-profile this once bug 929484 hits nightly.