Closed Bug 1118034 Opened 9 years ago Closed 9 years ago

40% TART/CART Regression on Inbound (v.37) Jan 5 from push 7d7ac60fb354

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: birtles, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Bug 927349 changes the way animations start. Instead of beginning immediately, they wait until their first frame is painted and then start from that time.

That's caused a sudden jump on Tab Animation Test and Customization Animation Test results (up to ~40% regression).

Tab Animation Test:
http://graphs.mozilla.org/graph.html#tests=%5B%5B293,131,37%5D,%5B293,63,21%5D,%5B293,131,31%5D,%5B293,131,35%5D,%5B293,131,33%5D%5D&sel=1420373448000,1420546248000&displayrange=7&datatype=running

Customization Animation Test:
http://graphs.mozilla.org/graph.html#tests=%5B%5B309,131,33%5D,%5B293,63,21%5D,%5B293,131,33%5D,%5B293,131,35%5D,%5B293,131,31%5D%5D&sel=1420373592000,1420546392000&displayrange=7&datatype=running

I'm not sure how to break down exactly what numbers have changed. Avi, can you help?

Depending on what is going on we may need to one or more of the following:

* Change what we're measuring to no longer include the initial paint time when measuring animation smoothness / animation duration.
* If the tab animation is specifically trying to work around dropped frames caused by the initial paint, then we might need to update the tab animation to stop doing this (since bug 927349 does this for us)
* If the tab animation is intended to skip frames when painting is slow then we may need to add a chrome-only override to avoid the behavior added in bug 927349 (e.g. -moz-animation-start-behavior: smooth | instant).

I could use some help understanding what these numbers represent and what effect this change has had on perceived performance. Again, Avi, can you help here?
Flags: needinfo?(avihpit)
(In reply to Brian Birtles (:birtles) from comment #0)
> I'm not sure how to break down exactly what numbers have changed. Avi, can
> you help?

Unfortunately we don't have tools right now to display which sub-tests changed. DataZilla used to do this, but it's not working or maintained these days, and tree herder is not there yet.

The closest to that right now would be compare-talos http://compare-talos.mattn.ca/

> * Change what we're measuring to no longer include the initial paint time
> when measuring animation smoothness / animation duration.

Not sure I understand why would you suggest that. Some of the TART (and CART) tests measure the "lag" of the animation start (implicitly - by measuring how long it takes the animation to complete from the trigger until it's done), and if as a result it now takes an extra 20ms (for instance), then that's a real change which the user might notice, and so it should be reflected at the numbers.

The fact that this change is by design (or possibly implied design) doesn't mean it didn't happen or that the user might not notice it, and therefore it's actually important that we keep measuring it - exactly to be able to track how changes like bug 927349 affects the behavior.

The test's only goal is to alert people to changes and to allow tracking the behavior. There's no hard threshold beyond which a regression value is unacceptable (though it might raise some eyebrows if you'd choose to ignore a big change in numbers without decent explanation on why it should be acceptable).

You as as developer should take that decision - by having the full view of how the numbers changed.


> * If the tab animation is specifically trying to work around dropped frames
> caused by the initial paint, then we might need to update the tab animation
> to stop doing this (since bug 927349 does this for us)

Just to clarify, when you say "tab animation" - do you mean the animation itself? or the test? - I assume the former.

I don't think there's an explicit mechanism to work around dropped frames in tab animation, but it animates so implicitly anyway - like any other CSS animation. I.e. if on your system the animation can't keep up with 60Hz, then it would draw as fast as it can, each drawn frame according to the the time elapsed since time counting has started for this animation.


> * If the tab animation is intended to skip frames when painting is slow then
> we may need to add a chrome-only override to avoid the behavior added in bug
> 927349 (e.g. -moz-animation-start-behavior: smooth | instant).

Not sure I get this one. You want tab animation to be excluded from such behavior (which is the standard in animation) and still draw all frames even if on a specific system it just can't render more than 1 tab animation frame per second, for instance?

Why would tab animation behave differently than other animations?


> I could use some help understanding what these numbers represent and what
> effect this change has had on perceived performance. Again, Avi, can you
> help here?

The description of what the numbers represent is here: https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART

But like I mentioned earlier, other than the unofficial compare-talos, we don't have tools which could show us how each sub-test changed due to this.
Flags: needinfo?(avihpit)
(In reply to Avi Halachmi (:avih) from comment #1)
> (In reply to Brian Birtles (:birtles) from comment #0)
> > I'm not sure how to break down exactly what numbers have changed. Avi, can
> > you help?
> 
> Unfortunately we don't have tools right now to display which sub-tests
> changed. DataZilla used to do this, but it's not working or maintained these
> days, and tree herder is not there yet.
> 
> The closest to that right now would be compare-talos
> http://compare-talos.mattn.ca/

I'll give that a try, thanks.

> > * Change what we're measuring to no longer include the initial paint time
> > when measuring animation smoothness / animation duration.
> 
> Not sure I understand why would you suggest that. Some of the TART (and
> CART) tests measure the "lag" of the animation start (implicitly - by
> measuring how long it takes the animation to complete from the trigger until
> it's done), and if as a result it now takes an extra 20ms (for instance),
> then that's a real change which the user might notice, and so it should be
> reflected at the numbers.

Great. I was concerned that we might be calculating the FPS based on the number of frames observed between triggering the animation and when it finishes. From your comments though it appears we're not doing that so there's no need to update the test.

> The fact that this change is by design (or possibly implied design) doesn't
> mean it didn't happen or that the user might not notice it, and therefore
> it's actually important that we keep measuring it - exactly to be able to
> track how changes like bug 927349 affects the behavior.

> > * If the tab animation is specifically trying to work around dropped frames
> > caused by the initial paint, then we might need to update the tab animation
> > to stop doing this (since bug 927349 does this for us)
> 
> Just to clarify, when you say "tab animation" - do you mean the animation
> itself? or the test? - I assume the former.

Yes, the former. For example, in Firefox OS we use a number of hacks to avoid dropping the initial frames of an animation. The plan is that after bug 927349 lands we can drop those hacks.

My concern is that we could be doing something similar here. For example, forcing an element to be layerized before triggering its animation. After bug 927349 lands the end result would be that such hacks could make the animation start more slowly than it needs to (since we'd first layerize, then trigger the animation, then wait for it to paint, then start it). It seems like we're not doing that here so we're ok.

> > * If the tab animation is intended to skip frames when painting is slow then
> > we may need to add a chrome-only override to avoid the behavior added in bug
> > 927349 (e.g. -moz-animation-start-behavior: smooth | instant).
> 
> Not sure I get this one. You want tab animation to be excluded from such
> behavior (which is the standard in animation) and still draw all frames even
> if on a specific system it just can't render more than 1 tab animation frame
> per second, for instance?
> 
> Why would tab animation behave differently than other animations?

For some kinds of interaction it may be more desirable that the animation finishes within a certain amount of time rather than display smoothly, even if that means that on some systems that means the whole animation gets skipped.

I don't know what the UX folks' requirements were when tuning tab animations but if, for example, they required "the new tab button must be fully displayed after X ms even if that means dropping all animation frames" then we'd need to add in a means for producing that behavior.

> > I could use some help understanding what these numbers represent and what
> > effect this change has had on perceived performance. Again, Avi, can you
> > help here?
> 
> The description of what the numbers represent is here:
> https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART
> 
> But like I mentioned earlier, other than the unofficial compare-talos, we
> don't have tools which could show us how each sub-test changed due to this.

Ok, I'll try my luck with compare-talos and see if I can work out exactly how performance changed with this bug. Thanks for your help!
(In reply to Brian Birtles (:birtles) from comment #2)
> > > * Change what we're measuring to no longer include the initial paint time
> > > when measuring animation smoothness / animation duration.
> > 
> > Not sure I understand why would you suggest that. Some of the TART (and
> > CART) tests measure the "lag" of the animation start (implicitly - by
> > measuring how long it takes the animation to complete from the trigger until
> > it's done), and if as a result it now takes an extra 20ms (for instance),
> > then that's a real change which the user might notice, and so it should be
> > reflected at the numbers.
> 
> Great. I was concerned that we might be calculating the FPS based on the
> number of frames observed between triggering the animation and when it
> finishes. From your comments though it appears we're not doing that so
> there's no need to update the test.

There are several sub-tests. The *.all values actually do calculate the average frame interval as the overall duration from trigger to completion divided by number of frames.

But I still don't see how it's undesirable to reflect such value.


> Yes, the former. For example, in Firefox OS we use a number of hacks to
> avoid dropping the initial frames of an animation. The plan is that after
> bug 927349 lands we can drop those hacks.
> 
> My concern is that we could be doing something similar here. For example,
> forcing an element to be layerized before triggering its animation. After
> bug 927349 lands the end result would be that such hacks could make the
> animation start more slowly than it needs to (since we'd first layerize,
> then trigger the animation, then wait for it to paint, then start it). It
> seems like we're not doing that here so we're ok.

Not that I know of, and AFAIK it doesn't, but I didn't implement tab animations. In general that's fx-team domain.


> For some kinds of interaction it may be more desirable that the animation
> finishes within a certain amount of time rather than display smoothly, even
> if that means that on some systems that means the whole animation gets
> skipped.
> 
> I don't know what the UX folks' requirements were when tuning tab animations
> but if, for example, they required "the new tab button must be fully
> displayed after X ms even if that means dropping all animation frames" then
> we'd need to add in a means for producing that behavior.

TBH I don't think there was ever such detailed requirements by UX.

My personal opinion is that it should follow the common (or Firefox's) animation standards - which is to respect the deadline pretty much as the highest requirement, with the new exception due to bug 927349 that it's allowed to start the stopwatch when the first frame is displayed - which could be a bit later than when the user requested it to start.
Trying compare-talos it seems like the regression is pretty much entirely in the 'error' number which represents the "difference between the designated duration and the actual completion duration from the trigger." (https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART)

That's entirely expected since the actual completion duration should be longer now that we wait for the animation to get ready before starting it.

Specifically, I'm looking at:

  http://compare-talos.mattn.ca/?oldRevs=30bec23b21b3&newRev=7d7ac60fb354&server=graphs.mozilla.org&submit=true

* For Ubuntu 32/64, Mac 10.8, Windows 7 and Windows 8 pretty much all the measures are green except the 'error' number. There are occasional regressions in other measures but they are very small in magnitude, particularly compared to the 'error' number.
* Mac 10.6, and Windows XP to a lesser extent, have red in a number of other places but never of the same magnitude as the 'error' number. I'm not quite sure what's going on there.

I think it's fair to conclude that the reported regression is simply due to the extra time taken to begin the animation and is to be expected.
As discussed with Avi on IRC, I think the TART/CART measurements are still valid as measures of responsiveness.

Also, I think that this regression is to be expected based on the analysis in comment 4. Our initial experience with B2G is that the animation behavior provided by bug 927349 represents a significant improvement in perceived performance. I'm going to mark this bug as wontfix for now. Once this hits nightly, if users perceive this change as a performance regression then we can re-open this bug at that point.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
(In reply to Brian Birtles (:birtles) from comment #5)
> Also, I think that this regression is to be expected based on the analysis
> in comment 4. Our initial experience with B2G is that the animation behavior
> provided by bug 927349 represents a significant improvement in perceived
> performance. I'm going to mark this bug as wontfix for now. Once this hits
> nightly, if users perceive this change as a performance regression then we
> can re-open this bug at that point.

While it's expected, the magnitude in percentage of the changed numbers is quite high - hundreds of % typically.

However, looking at the absolute values together with understanding what it means, we're seeing the animation ending "late" by at most one frame worth of ms typically, which should be for the most part unnoticeable, and with the added values of bug 927349.

I say yay! :)
Resolution: WONTFIX → FIXED
Whoops..

Accidentally changed to FIXED. Back to WONTFIX.
Resolution: FIXED → WONTFIX
Here is a list of alerts on alert manager for future reference:
http://alertmanager.allizom.org:8080/alerts.html?rev=7d7ac60fb354&showAll=1&testIndex=0&platIndex=0
Blocks: 1108235
Keywords: perf, regression
Summary: Regression in Tab Animation Test and Custom Animation Tests after changing animation start behavior → 40% TART/CART Regression on Inbound (v.37) Jan 5 from push 7d7ac60fb354
Whiteboard: [talos_regression]
You need to log in before you can comment on or make changes to this bug.