Multi-second browser jank after transition between normal and full-screen video

RESOLVED FIXED in Firefox 63

Status

()

P1
normal
RESOLVED FIXED
5 months ago
4 months ago

People

(Reporter: aklotz, Assigned: smaug)

Tracking

(Blocks: 1 bug, {perf, regression})

Trunk
mozilla64
perf, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox61 unaffected, firefox62 unaffected, firefox63+ fixed, firefox64 fixed)

Details

(Whiteboard: [qf:p1:f64])

Attachments

(2 attachments)

(Reporter)

Description

5 months ago
Today's Nightly, Build 20180816100035, Windows 10 x64

https://perfht.ml/2OHccU6


STR is pretty easy, just go to YouTube, play a video, switch to fullscreen. Wait for the video to play full screen, then switch back.
ServoStyleSet::ResolveServoStyleByAddingAnimation took 300ms there.

https://perfht.ml/2MTp5tK

I think this is a recent regression.
Keywords: regressionwindow-wanted
(Reporter)

Updated

5 months ago
Keywords: perf, regression
Alice, I don't suppose you have a chance to look for a regression window for this recent perf regression?

Comment 3

5 months ago
what/how browserjank?
Yeah, I don't see any jank. Aaron, are you still able to reproduce this?
Flags: needinfo?(aklotz)
(Reporter)

Comment 5

5 months ago
Yes. Build 20180821100053
Flags: needinfo?(aklotz)
(Reporter)

Comment 6

5 months ago
I am a tab hoarder -- could having a large number of tabs be somehow be factoring into this?
Could be.  I hadn't noticed that the ResolveServoStyleByAddingAnimation happened on chrome window.  And it was from _layOutTitlebar in browser-tabsintitlebar.js.  We might want to stop triggering any transitions in tabbar during fullscreening or un-fullscreening.

CCing some browser folks.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #7)
> Could be.  I hadn't noticed that the ResolveServoStyleByAddingAnimation
> happened on chrome window.  And it was from _layOutTitlebar in
> browser-tabsintitlebar.js.  We might want to stop triggering any transitions
> in tabbar during fullscreening or un-fullscreening.

Hmm, _layOutTitlebar is called from a callback of fullscreen event, and the fullscreen event is dispatched before styling.  Should we use promiseDocumentFlushed there?  :mconley?
Flags: needinfo?(mconley)
I think that synchronous flush is there to avoid this sort of bug:

https://bugzilla.mozilla.org/attachment.cgi?id=8620954

(from bug 1173768).

I suspect that wrapping that TabsInTitlebar.update inside a pDF would result in momentary glitchiness in the UI.

Coincidentally, I'm looking for ways to move more of the TabsInTitlebar layout calculations out of JS and express them as CSS in bug 1356920, but that bug is in its infancy.

We should probably try to find out why this got worse recently.
Flags: needinfo?(mconley)
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1:f64]
(In reply to Mike Conley (:mconley) (:⚙️) from comment #9)
> I think that synchronous flush is there to avoid this sort of bug:
> 
> https://bugzilla.mozilla.org/attachment.cgi?id=8620954
> 
> (from bug 1173768).
> 
> I suspect that wrapping that TabsInTitlebar.update inside a pDF would result
> in momentary glitchiness in the UI.
> 
> Coincidentally, I'm looking for ways to move more of the TabsInTitlebar
> layout calculations out of JS and express them as CSS in bug 1356920, but
> that bug is in its infancy.
> 
> We should probably try to find out why this got worse recently.

Indeed.  Note that your comment reminded me that we freezes refresh driver (which means that most events are not dispatched, and style/layout/paint triggered by the refresh driver's tick are not processed) during fullscreen process.  So there might be a bunch of pending events/styles/layouts/paints anyway.
If you're not going to be taking this, who should we redirect it to?
Flags: needinfo?(hikezoe)
It's not yet clear to us whether this is a recent regression or not.
Aaron, can you please take time to make sure whether this is a recent regression or not?  And if it's a regression it would be nice to narrow down it. I think you are the only person to reproduce this.
Flags: needinfo?(hikezoe) → needinfo?(aklotz)
(Reporter)

Comment 13

5 months ago
I'll try...
(Reporter)

Updated

5 months ago
Depends on: 1487262
(Reporter)

Comment 14

5 months ago
regression-window
According to mozregression, the bad commit is either:

https://hg.mozilla.org/integration/mozilla-inbound/rev/ca936adbc565
or
https://hg.mozilla.org/integration/mozilla-inbound/rev/b5f6cc47313d

But because both commits were part of the same push, that's the best I can give you.
Flags: needinfo?(aklotz)
(Reporter)

Updated

5 months ago
No longer depends on: 1487262
Thank you Aaron! That's a great help!

Olli, your change (I guess it's bug 1469521) seems to make the performance of ensure_child() worse.  Can you please take a look?  In the profile in comment 1, we spend 300ms in ensure_child() and I can see a prev_sibling call inside a loop in the function.
Blocks: 1469521
Flags: needinfo?(bugs)
(Reporter)

Updated

5 months ago
Keywords: regressionwindow-wanted
That doesn't really make sense. `ensure_child` is rule tree stuff, the previous sibling there is not related to DOM nodes whatsoever.
Chances are that that change _does_ cause the difference, specially based on comment 6 & co, but it's very unlikely that it's because of those 300ms in ensure_child...
So bug 1469521 _is_ the right regression range, but not because of the style system stuff mentioned in comment 15. Most of the time spent in the profile in comment 0 is in CompareDocumentPosition, which is an operation which bug 1469521 did obviously regress.
(Reporter)

Comment 19

5 months ago
In particular, most of the "self" time is being consumed by nsINode::ComputeIndexOf.
Thank you, Emilio, as usual. :)  (I was about to do NI to you in comment 15, didn't know you were in CC list in that bug.)

Moving into Core::DOM.
Component: DOM: Animation → DOM
So this is AnimationEventDispatcher::DispatchEvents taking a long time to sort the events... That can only happen when we have a bunch of transitions on siblings. So I'd consider to keep this as a DOM: Animation bug for now.

Looks like we're firing a transition per tab and we're struggling there, should be pretty easy to construct a test-case that demonstrates the issue.

So, possible fixes:

 * Fix the front-end so that leaving full-screen doesn't trigger a gazillion transitions on each tab. This is the less desirable fix, but also probably easier?

 * Fix it so that we reduce calls to CompareDocumentPosition by walking the sibling list until we see the other sibling instead of computing two indices. That should reduce the work in half, maybe. That may be worth doing.

 * Can we fix the transitions stuff to be smarter as well? Chances are that in most cases the array is already sorted, can we detect that and avoid the work of sorting it and doing all this expensive work?
Created attachment 9005080 [details]
Testcase.

It indeed was easy to come up with a test-case for this.
Attachment #9005080 - Attachment mime type: text/plain → text/html
(Reporter)

Comment 23

5 months ago
[Tracking Requested - why for this release]: Perf regression
status-firefox62: --- → unaffected
tracking-firefox63: --- → ?
(Reporter)

Comment 24

5 months ago
(In reply to Emilio Cobos Álvarez (:emilio) from comment #21)
>  * Fix the front-end so that leaving full-screen doesn't trigger a gazillion
> transitions on each tab. This is the less desirable fix, but also probably
> easier?

Note that entering full-screen is also affected.
NI to me to figure out a way to make the animation events sort issue better.
Flags: needinfo?(hikezoe)
(In reply to Emilio Cobos Álvarez (:emilio) from comment #21)
>  * Can we fix the transitions stuff to be smarter as well? Chances are that
> in most cases the array is already sorted, can we detect that and avoid the
> work of sorting it and doing all this expensive work?

In the test case in comment 22, the events are a bunch of chunks of ordered events. E.g. '1 2 3 4 5', '10 11 12 13', etc.  It is due to we traverse those chunk of elements on different threads, I guess.  I wonder each element can have an index for DOM tree order during traversal, and use the index for event sorting?  The other idea I can think of is similar to what Emilio mentioned in comment 21.  We can check both elements have the same parent in nsINode::CompareDocumentPosition in this particular test case.

Keep NI to me.
(Assignee)

Updated

5 months ago
Component: DOM → DOM: Animation
(Assignee)

Comment 27

5 months ago
Looks like OwningElementRef or some such should definitely cache the index during sorting, and then we could have a version of nsContentUtils::PositionIsBefore which takes nodes and their index in parent as params.
That should reduce ComputeIndexOf call quite a bit.
status-firefox61: --- → unaffected
status-firefox-esr52: --- → unaffected
status-firefox-esr60: --- → unaffected
tracking-firefox63: ? → +
Marking this significant perf regression as P1. It sounds like we can fix it on the Animation side (presumably we think the regression to CompareDocumentPosition from bug 1469521 is acceptable).
Priority: -- → P1
(Assignee)

Comment 29

5 months ago
I guess I could add back some kind of cache for ComputeIndexOf
Assignee: nobody → bugs
(Assignee)

Comment 30

5 months ago
Let's see what tryserver thinks of these. 
- A cache for ComputeIndexOf
- reorder member variables of Animation a bit to use less memory
- cache child index in Animation when reordering

remote: View your changes here:
remote:   https://hg.mozilla.org/try/rev/86900f210d1f2eb640397735e6525e8f36622dbd
remote:   https://hg.mozilla.org/try/rev/30f5228c4a0b2a9792ef820edde905a47dea22d5
remote:   https://hg.mozilla.org/try/rev/6768f8c1b1a2318b5606d355a29ef7aeae6c54c7
remote: 
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=6768f8c1b1a2318b5606d355a29ef7aeae6c54c7
remote: recorded changegroup in replication log in 0.013s
Flags: needinfo?(bugs)
(Assignee)

Updated

5 months ago
Depends on: 1488278
(Assignee)

Updated

5 months ago
Depends on: 1488279
(Assignee)

Comment 31

5 months ago
Created attachment 9006089 [details] [diff] [review]
cache_index_in_animation.diff

this is perhaps not the most beautiful, but we basically must cache the index here to get good performance.
Other thing to fix is bug 1488278 - I'll upload a patch there in a minute.

ehsan, could you take a look at the dom/base/* and hiro for the rest
Attachment #9006089 - Flags: review?(hikezoe)
Attachment #9006089 - Flags: review?(ehsan)
Comment on attachment 9006089 [details] [diff] [review]
cache_index_in_animation.diff

Review of attachment 9006089 [details] [diff] [review]:
-----------------------------------------------------------------

Animation part looks good to me.
Thanks for fixing this!!
Attachment #9006089 - Flags: review?(hikezoe) → review+

Comment 33

5 months ago
Comment on attachment 9006089 [details] [diff] [review]
cache_index_in_animation.diff

Review of attachment 9006089 [details] [diff] [review]:
-----------------------------------------------------------------

Please nominate for uplift if needed.
Attachment #9006089 - Flags: review?(ehsan) → review+
No longer depends on: 1488278
Depends on: 1488278

Comment 34

5 months ago
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2a0dbf6d6e9
cache the index of a child node when ordering animations for event dispatch, r=hiro,ehsan
Flags: needinfo?(hikezoe)
(Assignee)

Comment 35

5 months ago
Comment on attachment 9006089 [details] [diff] [review]
cache_index_in_animation.diff

Approval Request Comment
[Feature/Bug causing the regression]: bug 651120
[User impact if declined]: Jank
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: Landed to m-i yesterday, tested locally
[Needs manual test from QE? If yes, steps to reproduce]: I'd say no
[List of other uplifts needed for the feature/fix]: bug 1488279 and bug 1488278
[Is the change risky?]: The change shouldn't be risky 
[Why is the change risky/not risky?]: just caching a return value of a method call temporarily 
[String changes made/needed]: NA
Attachment #9006089 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/a2a0dbf6d6e9
Status: NEW → RESOLVED
Last Resolved: 5 months ago
status-firefox64: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
(Reporter)

Comment 37

5 months ago
This is significantly better than before, but I am still setting ~400ms jank after leaving full-screen mode.

Is this just something that we have to live with when there are high numbers of open tabs?
https://perfht.ml/2oGKWtU
(Assignee)

Comment 38

5 months ago
I was using the testcase for profiling, since I can't reproduce the issue using full-screen.
(Assignee)

Comment 39

5 months ago
But I'd say no, we don't have to live with it. If there is some testcase for profiling, we could probably improve things. Not yet sure how though.
Comment on attachment 9006089 [details] [diff] [review]
cache_index_in_animation.diff

Review of attachment 9006089 [details] [diff] [review]:
-----------------------------------------------------------------

Noticabmle perf regression, approved for 63 beta
Attachment #9006089 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 41

5 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/1b236230b856
status-firefox63: affected → fixed
(Reporter)

Updated

4 months ago
Blocks: 1490510
You need to log in before you can comment on or make changes to this bug.